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ABSTRACT 

We introduce a method for constructing end-to-end mock galaxy catalogues using a semi- 
analytical model of galaxy formation, applied to the halo merger trees extracted from a cos- 
mological N-body simulation. The mocks that we construct are lightcone catalogues, in which 
a galaxy is placed according to the epoch at which it first enters the past lightcone of the ob- 
server, and incorporate the evolution of galaxy properties with cosmic time. We determine the 
position between the snapshot outputs at which a galaxy enters the observer's lightcone by in- 
terpolation. As an application, we consider the effectiveness of the BzK colour selection tech- 
nique, which was designed to isolate galaxies in the redshift interval lA < z < 2.5. The mock 
catalogue is in reasonable agreement with the observed number counts of all BzK galaxies, as 
well as with the observed counts of the subsample of BzKs that are star- forming galaxies. We 
predict that over 75 per cent of the model galaxies with Kab ^ 23, and lA < z < 2.5, are 
selected by the BzK technique. Interloper galaxies, outside the intended redshift range, are 
predicted to dominate bright samples of BzK galaxies (i.e. with Kab ^ 21). Fainter K-band 
cuts are necessary to reduce the predicted interloper fraction. We also show that shallow B- 
band photometry can lead to confusion in classifying BzK galaxies as being star-forming or 
passively evolving. Overall, we conclude that the BzK colour selection technique is capable of 
providing a sample of galaxies that is representative of the 1.4 < z < 2.5 galaxy population. 
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1 INTRODUCTION 

Modern galaxy surveys such as the Sloan Digital Sky Survey 
(SDSS, |York et al.|2000|) and the 2-degree Field Galaxy Redshift 
Survey (2dFGRS, fcolless et al.|[200 T] [2003] ) have revolutionised 
our view of the galaxy distribution and have played a key role in 
shaping the constraints on our cosmological model (e.g. ^Norberg, 



et al.||2001||2002||Zehavi et al.| 


2005 


ICole et al.||2005 


1 Sanchez 


et aL"2006i"Tegmark et al.'2006| 


Sand 


liez et al.|2009||Ze 


havi et al. 



2011 , Sanchez et al. 2012 ). The size of these surveys has heralded 
the start of an era of precision cosmology wherein we can mea- 
sure statistics, such as the galaxy luminosity function, with random 
errors that are smaller than the systematic errors. To continue to 
make progress it is essential that we improve our understanding of 
how the estimation of such statistics is affected by the construction 
of a galaxy survey and the selection criteria applied. Mock galaxy 
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catalogues, which mimic the selection effects in real surveys, have 
emerged as an essential tool with which to achieve this aim, and 
play a central role in the analysis and exploitation of galaxy sur- 
veys. 

When working with an observational galaxy catalogue, an es- 
timator designed to recover a statistic, such as the luminosity func- 
tion or correlation function, will have to compensate for a variety 
of effects such as non-uniform coverage of the sky and a selection 
function that varies strongly with radial distance from the observer. 
The primary advantage of a mock catalogue is that, by construc- 
tion, we already know the 'true' answer for the statistic without 
these effects. By comparing a measurement extracted from a syn- 
thetic mock catalogue with the ideal result (i.e. the statistic mea- 
sured using a complete sample of galaxies from the original sim- 
ulation cube), one can adjust and tune the performance of the es- 
timator to reduce any systematic effects. A prime example is that 
of algorithms designed to find groups of galaxies, the calibration of 
which requires foreknowledge of the underlying dark matter halo 
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distribution in order to test how faithfully the algorithm can recover 
these structures when working in redshift space ( |Eke et al.|2004[ 
[Robotham et al.|20 11 [ [Murphy et al. 2012 ). Additionally, mock cat- 
alogues can be used to forecast the scientific return of future galaxy 
surveys ( [Cole et al.|1998|[Cai et aL'|2009l[Orsret aL|2010| ). There- 
fore they can help shape the design of a survey by assessing the 
level, and quality, of the statistics recoverable with any particular 
configuration. Finally, mock catalogues allow us to cast the predic- 
tions of theoretical models of galaxy formation in a form that can 
be directly compared against observables. 

In this paper we present a method for building mock cata- 
logues for galaxy surveys, which can cover any redshift range. 
Mock catalogues constructed in this way have already been used 
extensively by the Galaxy And Mass Assembly (GAM A) survey 
([Driver et al.|2009| see also [Robotham et al.|20lT|[Alpaslan"erar[ 
[2012] !. Here, we illustrate the power of mock catalogues by evaluat- 
ing the performance of the BzK colour selection technique, which 
was designed to isolate galaxies in the redshift interval lA < z < 
2.5 ( [Daddi et al.|2004a| . This redshift range is an exciting one for 
galaxy assembly, since it is thought that most of the stellar mass 
of many of the progenitors of present day massive galaxies formed 
during this period (Madau et al. 1998, Dickinson et al. 2003 ). Un- 
fortunately this epoch lies within the 'redshift desert' where the 
spectroscopic measurement of galaxy redshifts is difficult due to 
the lack of strong spectral features at optical wavelengths. Only re- 
cently, with the development of near-infrared (NIR) spectroscopy, 
have large galaxy surveys begun to probe this region and to assess 
the build-up of galaxies over this crucial period (e.g. [Franx et al.[ 
[20031 [vanPokkum et al.|2003] ). 

Prior to this, knowledge of the galaxy population in the 
'desert' was derived from photometry. This led to the development 
of colour selection techniques designed to efficiently identify tar- 
gets for spectroscopic follow-up (which is much more expensive). 
A well-known example of this is the Lyman-break dropout tech- 
nique, proposed by [Steidel et al. |(1996 2003 2004), which iden- 
tifies star-forming galaxies at ^ ^ 3 — 10 according to their rest- 
frame ultraviolet colours and sampling of the Lyman break spectral 
feature. Other examples include extremely red objects at z ^ 1 ( [El-[ 
[ston et al ."1988 ," McCarthy|2004[ ) and distant red galaxies at 2; 2 
( [Franx e t al. 2003 ). 

A popular photometric technique, designed to simultaneously 
identify populations of star-forming and passively evolving galax- 
ies, is the BzK colour-criterion ( [Daddi et al.[2004a[ ). This approach, 
which selects galaxies based on their {B — z) and {z — K) colours, 
is designed to deliver galaxy samples within the redshift range 
1.4 < z < 2.5 that are not biased by the presence of dust or 
by the age of their stellar population s ([Kong et al.|2006} Hayashi 
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Early studies of Kab ^ 2^BzK-selected galaxies revealed 
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thors to speculate that BzK galaxies are the high-redshift precur- 
sors of massive early-type galaxies found in groups and clusters at 
the present day. The key question we address here is: Are the prop- 
erties of the bright galaxies identified by this selection technique 
representative of the overall population with 1.4 < z < 2.5, or 
are we really just seeing a special subclass of galaxies? If the latter 
is true, is this simply because current observations have not been 
sufficiently deep to see the fainter, more representative galaxies or 
is the BzK criterion somehow biased towards selecting a subset of 
the galaxy population? A galaxy mock catalogue is a vital resource 
in helping to answer these questions by allowing an assessment of 
the effects of observational selection on the completeness of the 
galaxy sample as well as an examination of whether or not the BzK 
criterion is sensitive to the intrinsic properties of galaxies. 

The layout of the paper is as follows: In section [2] we sum- 
marise the various methods used to construct mock galaxy cata- 
logues. In section |3j we introduce the numerical simulation and 
galaxy formation model that we will use as inputs, before, in sec- 
tion [4] providing further details of our method for constructing 
lightcone mock catalogues. This section provides a full overview of 
the lightcone construction, including the assignment of positions to 
model galaxies. In section [5] we use a lightcone mock catalogue 
to assess the performance of the BzK selection technique. Note 
that this application only uses some features of the lightcone; the 
clustering of BzK galaxies will be dealt with in a separate paper. 
Finally, in section [6] we summarise our method and present our 
conclusions. Throughout this paper we use magnitudes in the AB 
system. 



2 CONSTRUCTING MOCK CATALOGUES 

In this section we provide an overview of the basic procedure for 
constructing mock catalogues and set out the advantages of using a 
semi- analytical model of galaxy formation. 



2.1 Overview of the technique 

A very basic mock catalogue could be constructed by randomly 
sampling one of the measured statistical distributions that de- 
scribe the galaxy population (e.g. the luminosity function or stellar 
mass function). Although the resulting catalogue of galaxies would 
match that particular statistic (by construction), without any fur- 
ther information about the galaxies, such as their colour or spatial 
distribution, the mock would be very limited. 

Building a more realistic mock catalogue, with positional in- 
formation and including other galaxy properties and their evolu- 
tion, requires the use of a numerical simulation which follows the 
growth of structure in the dark matter. The procedure for construct- 
ing mock catalogues from a numerical simulation can be broken 
down into the following steps: (i) generate a population of galax- 
ies either empirically or using a physical model, using either the 
dark matter distribution or dark matter halos, (ii) place these galax- 
ies into a cosmological volume, (iii) apply the angular and radial 
selection functions of the survey. 



2.1.1 Generating a galaxy population 

To generate a population of galaxies one must first model the dis- 
tribution of dark matter, which is often done with a N-body simu- 
lation. Dark matter only N-body simulations allow us to build halo 
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populations using gravity alone. The full spatial information pro- 
vided by N-body simulations allows one to extract clustering infor- 
mation, which would otherwise not be available if a Monte-Carlo 
approach was to be used. Additionally, the merger histories of ha- 
los in N-body simulations will also include environmental effects, 
such as halo assembly bias (^ Gao et al.|2005| ). 

The way in which dark matter halos are populated with galax- 
ies is where the methods of mock catalogue construction can differ. 
[Blaizot et al.| ( ^2005 ) (see also Baugh 2008 ) summarise several of 
the different methods available, which include using phenomeno- 
logical models to assign galaxies to dark matter particles in the sim- 
ulation (e.g. |Cole et al. 1998 ) or using empirically derived statis- 
tics, such as the halo occupation distribution (HOD, Berlind & 
|Weinberg|2002[ Song et aLl2012^ or sub-halo abundance matching 
( |Vale & Ostriker|2004| ). Other, more physical approaches are also 
possible. For instance, one could include the baryons in the original 
simulation, using either a grid-based or particle-based method to 
solve the hydrodynamical equations. The problem with direct, hy- 
drodynamical simulations however, is that they are computationally 
expensive and so, in practice, are restricted to small volumes (e.g. 
the 25/i~^Mpc and 100/i~^Mpc boxes used in the Overwhelm- 
ingly Large Simulations project of |Schaye et al.|2010| . 

A powerful approach, that we choose to adopt, is to use a semi- 
analytical model of galaxy formation to populate the halo merger 
trees extracted from a high resolution, cosmological N-body simu- 
lation (Diaferio et al. 1999 , Benson et al. '2000', 'Blaizot et a l.|2005[ 
jKitzbichler & White 2007 , Sousbie et al. 2008 , Overzier et al.| 
[20091 ICaiet^a l. 2009 , Henriques et al.|20 12] [Overzier et al.|2012| ). 
Modelling of various physical processes, such as the cooling of gas 
within dark matter halos, is necessary to follow the baryonic com- 
ponent and predict the fundamental properties of galaxies, such as 
their stellar mass and star formation history. The adoption of an 
initial mass function (IMF), a stellar population synthesis (SPS) 
model and a treatment of dust extinction allows these fundamental 
properties to be connected with observables, thus enabling a direct 
comparison between observations and the predictions of the galaxy 
formation model. 



2.7.2 Generating a cosmological volume 

Current and future galaxy surveys are designed to probe ever larger 
cosmological volumes. As a result there is a growing demand for 
simulations with boxes of sufficient size to match the volumes of 
these surveys. Unfortunately, current computing power means that 
a compromise must often be made between the volume of the sim- 
ulation box and the resolution at which the simulation is carried 
out. Therefore a sufficiently large cosmological volume can only be 
sampled by tiled replication of a smaller box simulation. For very 
shallow galaxy surveys (e.g. with a median redshift z < 0.05), the 
lookback time is sufficiently small that typical galaxy properties 
will not have undergone significant evolution across the redshift 
interval covered by the survey. In these instances, the statistics of 
the galaxy population at the extremes of the survey will not be too 
dissimilar to the statistics today and so one can build a mock cata- 
logue using galaxies from a single simulation snapshot. However, 
for very deep galaxy surveys which cover a significant lookback 
time, we would expect to see substantial evolution in galaxy prop- 
erties and in the growth of large-scale structure. Therefore more 
sophisticated mock catalogues, that tile the survey volume using 
many different simulation snapshots, are required to adequately re- 
produce the evolution seen in the properties of galaxies and their 
clustering. The mock catalogues that we construct in this work are 



lightcone mock catalogues, in which galaxies are placed according 
to the epoch at which they first cross the observer's past lightcone, 
i.e. at the location at which the light emitted from the galaxy has 
just enough time to reach the observer, and thus incorporate the 
evolution of structure with cosmic time. 

Finally, observations will be subject to uncertainties or biases, 
introduced as a result of survey design or selection effects, and so 
to properly relate theoretical predictions to observations we must 
subject the simulated data to the same selection functions as the 
observed galaxy sample. 



2.2 Why use a semi-analytical galaxy formation model? 

Modelling the formation of galaxies is a difficult task. Part of the 
problem is that our knowledge of the underlying physics is limited 
and so we cannot simply write down a precise formulation for ev- 
ery process. Furthermore, despite the continued development of di- 
rect, hydrodynamic simulations, current computational capabilities 
mean that many of the relevant processes (for example star forma- 
tion or supernova feedback) remain firmly below the resolution lim- 
its of direct simulations and can only be addressed through "sub- 
grid" physics. Semi-analytic models describe the sub-grid physics 
using physically motivated, parametrised equations that follow the 
evolution of baryons trapped in the gravitational potential wells of 
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and Benson 2010). There are several compelling advantages to us- 
ing semi-analytic models for building mock catalogues: 

(i) The development of deep, wide-field photometric galaxy sur- 
veys spanning large cosmological volumes has led to demand for 
large (suites of) mock catalogues that can be constructed rapidly 
and accurately. Semi-analytic modelling is currently the only phys- 
ical approach that meets these ideals: such models are capable of 
populating large cosmological volumes with galaxies much faster 
and at a lower computational cost than is currently possible with 
hydrodynamical simulations. 

(ii) The modular design of semi-analytic models allows new 
physics to be incorporated readily. Combined with their short run- 
time, this means that semi-analytic models can be tuned to match 
observations quickly, in response to a change to the background 
cosmology or to the galaxy formation physics. Moreover, the larger 
computational box that can be used in the N-body and semi- 
analytical approach, compared with a hydrodynamical simulation, 
means that the clustering predictions are robust out to larger scales. 

(iii) Empirically motivated methods, such as HOD modelling, 
must first be calibrated against observational data, (e.g. the Las- 
Damas mock catalogues, McBrideeta]J2009). Hence, mock cat- 
alogues built using such methods are limited by the availability 
of observational data at high redshift. Furthermore, the data that 
is available may be affected by sample variance leading to un- 
representative HOD parameters being fitted. Semi-analytic mod- 
els however, once tuned to fit the observations of galaxies at low 
redshift, can predict galaxy properties out to high redshift, with- 
out further observational input. Mock catalogues built from semi- 
analytical models are therefore much more flexible than catalogues 
constructed using other methods. 

(iv) The next generation of galaxy surveys will map the sky 
across a large portion of the electromagnetic spectrum, with multi- 
wavelength follow-up observations resulting in potentially complex 
survey selection functions, such as for the Galaxy And Mass As- 
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sembly (GAM A) Survey ( [Driver et"ar]|2011| ). Ideally mock cata- 
logues for future surveys need to provide a diverse range of galaxy 
properties as well as providing the capability to select galaxies si- 
multaneously in multiple bands. Semi-analytic models model the 
complete star formation history for each galaxy and so can pre- 
dict many different galaxy properties. Mock catalogues based upon 
semi-analytic models are already capable of mimicking sophisti- 
cated multi-band selection criteria. 



3 GALAXY FORMATION MODEL 

The model we adopt to generate the galaxy population for our 
mock catalogues is the'Bower et al.'('2006') variant of the GALFORM 
semi-analytic model (Cole et al. 2000). To build realistic lightcone 
mock catalogues we require spatial information, so we use dark 
matter halo merger trees extracted from the Millennium Simulation 
( [Springel et al.|2005| ). 

3.1 The Millennium Simulation 

3.1.1 Cosmology and parameters 

The population of dark matter halos for our mock catalogues is 
provided by the Millennium Simulation, a 2160^ particle N-body 
simulation of the ACDM cosmology carried out by the Virgo Con- 
sortium (Springel et al."2005 ). This simulation follows the hierar- 
chical growth of cold dark matter structures from redshift z — 127 
through to the present day in a cubic volume of size 500/i~^Mpc 
on a side. Halo merger trees are constructed using particle and 
halo data stored at 64 fixed epoch snapshots that are spaced ap- 
proximately logarithmically in expansion factor. The Millennium 
trees have a temporal resolution of approximately 0.26 Gyr at the 
present day, with approximate resolutions of 0.38, 0.35, 0.26 Gyr 
at redshifts z — 0.5,1,2 respectively. Halos in the simulation 
are resolved with a minimum of 20 particles, corresponding to a 
halo resolution of Mhaio,iim = 1-72 x 10^° /i^^M©, significantly 
smaller than expected for the Milky Way's dark matter halo. 

The cosmological parameters adopted in the Millennium Sim- 
ulation are: a baryon matter density Q.h = 0.045, a total matter den- 
sity Qm = ^^b + ^CDM = 0.25, a dark energy density Qa = 0.75, 
a Hubble constant Hq — 100/ikms~^Mpc~^ where h = 0.73, 
a primordial scalar spectral index ris = 1 and a fluctuation ampli- 
tude (78 = 0.9. These parameters were chosen to match the cos- 
mological parameters estimated from the first year results from the 
Wilkinson Microwave Anisotropy Probe ( ^Spergel et al.|2003] ). 

3.1.2 Construction of halo merger trees 

To construct the halo merger trees one must first identify groups 
of dark matter particles in each of the simulation snapshots. This 
is done using the Friends -Of -Friends algorithm (FOF, |Davis et aT] 
|1985| ). The Millennium Simulation was carried out with a specially 
modified version of the GAD GET 2 code ( jSpringel||2005| ) with a 
built-in FOF group-finder, allowing FOF groups to be identified 
on the fly. The algorithm SUBFIND ( [Springel et al.|200 r) was then 
used to identify self-bound, locally over-dense sub-groups within 
the FOF groups. This procedure typically results in the bulk of the 
mass of a FOF group being assigned to one large sub-group which 
represents the background mass distribution of the halo. The re- 
maining mass is usually split between smaller satellite sub-groups 



orbiting within the halo and unbound "fuzz" particles which are not 
associated with any sub-group. 

However, it is not uncommon for the FOF algorithm to join 
together structures which might be better considered to be sepa- 
rate halos for the purposes of semi-analytic galaxy formation. For 
example, nearby groups may be linked by tenuous "bridges" of par- 
ticles or they may only temporarily be joined. The merger tree al- 
gorithm we use in this work is intended to deal with these cases and 
ensure that the resulting trees are strictly hierarchical, i.e. once two 
halos are deemed to have merged they should remain merged at all 
later times. 

The first step in the construction of the merger trees is to iden- 
tify a descendant for each sub-group at the next snapshot. The de- 
scendant of each sub-group is identified as the sub-group at the next 
snapshot that contains the largest number of the Mink niost bound 
particles, where 

Mink = max (/trace iVp, Minkmin) , (1) 

with Np ^ 20, as already stated, and /trace and Minkmin are set 
to 0.1 and 10 respectively. Defining Mink in this way means that 
in well resolved cases we follow the most bound "core" of the sub- 
group, which is important for satellite sub-groups which may be 
tidally stripped of their outer parts. For the smallest groups with 
Mp ^ 20, Mink = 10 so we are following up to 50% of the par- 
ticles and so preventing inaccurate assignment due to low number 
statistics. 

The SUBFIND algorithm occasionally temporarily "loses" a 
sub-group between snapshots. For example, a sub-group may be 
identified at snapshot i, lost at one or more subsequent snapshots, 
and then identified again at snapshot i + n, where n > 1. This can 
happen if a small, isolated group briefly falls below the resolution 
limit or if a satellite sub-group passes close to the centre of its host 
halo. In either case we would like to identify the sub-group at snap- 
shot i + n as the descendant of the sub-group at snapshot i. Our 
approach to achieve this aim is as follows: 

(i) Identify sub-groups which may have been lost by SUBF IND. 

(ii) Identify sub-groups which may have just been reacquired by 

SUBFIND. 

(iii) Attempt to locate descendants of the sub-groups in (i) with 
the sub-groups in (ii). 

Groups which are "lost" are identified by looking for groups 
which either have no immediate descendant or are not the most 
massive progenitor of their immediate descendant. Some of these 
groups will have been lost because they have genuinely been dis- 
rupted and absorbed into the parent halo, but some will reappear 
later. Groups which have just been reacquired are identified by 
looking for "orphan" sub-groups, i.e. groups with no immediate 
progenitors. 

For each lost sub-group at snapshot i (where i is not the 
present day), we examine the orphan sub-groups at snapshot i + 2, 
i + 3, ... , i + Nstep. An orphan sub-group is identified as the de- 
scendant of a lost sub-group if at least a fraction /unk of the Munk 
most bound particles from the lost group are in the orphan group 
and no orphan descendant can be found at earlier snapshots. We 
usually set Mstep = 5 and /unk = 0.5. 

If this procedure results in the identification of a descendant 
for a sub-group, then that descendant will be used in the subse- 
quent stages of the construction of the merger trees. For all other 
sub-groups the descendant is taken to be the immediate descendant 
at the next snapshot. For the construction of merger trees, having a 
sub-group and its descendant separated by multiple snapshot out- 
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puts is not a problem. However, this is inconvenient for codes, such 
as GALFORM, which expect the descendant of a subhalo to always 
be found in the next snapshot. To avoid this, for those subhalos 
that are temporarily lost, interpolated sub-halos are inserted at each 
snapshot where the sub-halo is 'missing'. For very high resolution 
simulations this is a common occurrence. However, for simulations 
like the Millennium Simulation such interpolated sub-halos are rare. 

Next, the sub-groups at each snapshot are organised into a hi- 
erarchy of halos, sub-halos, sub-sub-halos etc.. For each sub-group 
in a FOF group we identify the least massive of any more massive 
"enclosing" sub-groups in the same FOF group. Sub-group A is 
said to enclose sub-group B if B's centre lies within twice the half 
mass radius of A. Any sub-group which is not enclosed by another 
is considered to be an independent halo. We also consider a sub- 
group to be an independent halo if it has retained at least 75 per 
cent of the maximum mass it has ever had while being the most 
massive sub-group in its FOF group. This is because we expect a 
halo involved in a genuine merger with a more massive halo to be 
stripped of mass. In either case, if a sub-group is deemed to be an 
independent halo then any sub-groups it encloses are also assigned 
to that halo. 

At this stage we have, for each snapshot, a population of ha- 
los, each of which consists of a grouping of SUBFIND sub-groups 
with pointers linking each sub-group with its descendant at the next 
snapshot. We choose the descendant of a halo to be the halo at the 
next snapshot which contains the descendant of the most massive 
sub-group in the halo. This defines the halo merger tree structure. 

The GALFORM model assumes that when a halo merges with 
another, more massive "host" halo, that its hot gas is stripped away 
so that no further gas can cool in the less massive halo. Since a 
halo can only be stripped of hot gas once, we wish to treat these 
objects as satellite sub-halos within their host halo for as long as 
they survive in the simulation, even if their orbit puts them outside 
the virial radius of their host halo at some later times. We therefore 
attempt to identify cases where halos fragment, and re-merge them. 

In practice we implement this by looking for satellite sub- 
groups which split off their host to become independent halos at 
the next snapshot. A sub-group will be re-merged if it satisfies all 
of the following conditions: 

• The sub-group is the most massive progenitor of its descen- 
dant. This is taken to mean that the sub-group survives at the next 
snapshot. 

• The sub-group is not the most massive sub-group in its halo. 
This indicates that it is a satellite sub-halo within a larger halo. 

• The descendant of the sub-group is the most massive group in 
its halo. 

• The descendant of the sub-group belongs to a halo other than 
the descendant of the halo containing the original sub-group. This 
indicates that the host halo has fragmented. 

The last condition is necessary because a sub-group can some- 
times become the most massive in its parent halo without any halo 
fragmentation occurring, especially if the halo consists of two sub- 
groups of similar mass. If these conditions are met, the halo con- 
taining the descendant of the satellite sub-group is merged with the 
descendant of the host halo. 

Following this post-processing, we are left with, for the Mil- 
lennium Simulation, approximately 20 million halo merger trees 
with, in total, approximately 1 billion nodes. 



3.2 The GALFORM semi-analytic model 

The Durham semi- analytical galaxy formation model, GALFORM, 
originally developed by [Cole et aL] { [2000| ), models the star for- 
mation and merger history of a galaxy and makes predictions for 
many galaxy properties including luminosities over a substantial 
wavelength range extending from the far-UV through to the sub- 
millim etre (|Baugh et al.| 20051 |Lacey al.|2008|[2010||Lagos etaT] 
|2011a|[Famdakis et al.|2011| [Lagos et al.|2012| )' 



3.2.1 Model overview 

The GALFORM model populates a distribution of dark matter ha- 
los with galaxies by using a set of coupled differential equations 
to determine how, over a given time-step, the "subgrid" physics 
regulate the size of the various baryonic components of galaxies. 
GALFORM models the main physical processes governing the for- 
mation and evolution of galaxies: (i) the collapse and merging of 
dark matter (DM) halos, (ii) the shock-heating and radiative cooling 
of gas inside DM halos, leading to the formation of galactic disks 
(iii) quiescent star formation in galactic disks, (iv) feedback as a 
result of supernovae, active galactic nuclei and photo-ionisation of 
the inter-galactic medium, (v) chemical enrichment of stars and gas 
and (vi) dynamical friction driven mergers of galaxies within DM 
halos, capable of forming spheroids and triggering starburst events. 
The prescriptions describing these physical processes are described 
in a series of papers: ^ Cole et al.|pOOO|l; Benson et al.|( 2003 |l; Baugh 



[eTaTl ([2005]); [Bower et al.H2006| ); |Font et al.| ( |2008| ); |Lacey et al" 



f2008^; "Lagos et al."f2011b), as well as in the reviews by [Baugl^ 
(2006 ) and Benson & Bower (2010 ). 

The star-formation history of a galaxy can be determined by 
tracking the star-formation rate, SFR, and chemical enrichment pre- 
dicted in its progenitors. Convolving this with a model SSP (single 
stellar population) allows one to predict the spectral energy distri- 
bution (SED) of the galaxy, which in turn can be sampled by filter 
transmission curves to predict rest-frame magnitudes in various ul- 
traviolet (UV), optical, near-infrared and far-infrared bands. Given 
the redshift of a galaxy, the fixed filter transmission curves can be 
shifted by an appropriate amount to obtain observer-frame magni- 
tudes. Dust extinction is incorporated by assuming the dust (whose 
mass is predicted by the chemical evolution model) is mixed to- 
gether with the stars in the disk of the galaxy in two phases: in 
clouds and in a diffuse component (see Granat o et al.|2000l ). As- 
suming a distribution of dust grain sizes, and combining this with 
the predicted scalelengths of the disk and bulge, allows one to cal- 
culate the optical depth and apply the appropriate attenuation to the 
luminosity at various wavelengths. We use the stellar population 



synthesis model of Bruzual & Chariot (a version from 1999, that 
is described in [Bruzual & Charlot|2003[ ) and assume a [Kennicilttj 
( |1983 ) IMF in all modes of star formation. 

We set the adjustable parameters in GALFORM by requiring 
that the model predictions match a subset of observations, primarily 
of the local galaxy population. We have traditionally assigned more 
weight in this process to matching the optical and near-infrared lu- 
minosity functions (see e.g. [Bower et al.|2010[ for a discussion of 
an automated version of this process). The requirement of match- 
ing the observed luminosity function has led to the inclusion of dif- 
ferent feedback mechanisms to regulate star formation. Feedback 
from stellar winds and supernovae (SNe) is important for reheat- 
ing cold gas (and thus quenching the star formation) in small ha- 
los. This process has been shown to allow the models to reproduce 
the faint end of the observed galaxy luminosity function ( (Benson| 
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|et al.|2003] ). The major extension made in [Bower et al.| ( [2006] ) was 
the introduction of feedback due to active-galactic nuclei (AGN), 
which quenches the cooling flow in massive, quasi-static hot halos 
and consequently shuts down star formation in their central galax- 
ies. This proposed solution to the over-cooling problem that had 
long plagued models of galaxy formation (e.g. [Benson et al.|2003| ), 
proved necessary to explain the break in the luminosity function, 
allowing the model to reproduce the bj-band and K-band lumi- 
nosity functions (including the evolution at the bright end) out to 
redshift z ^ 2. In addition, the [Bower et al. [ model is able to accu- 
rately predict the evolution of the galaxy stellar mass function out 
io z ^ b ( [Bielby et al.|201l| ), successfully reproduce the cluster- 
ing and abundance of luminous red galaxies as seen in the SDSS 
( [Almeida et al.|[2008] ), produce a bimodal distribution of galaxy 
colours that is in good agreement with observations of the SDSS 
([Gonzalez et al.|2009] ) and match the number counts and redshift 
distribution of extremely red objects ( Gonz alez-Perez et al.|20 09|). 

The GALFORM calculation uses dark matter merger histories 
extracted from the Millennium simulation. As commented upon 
above, this simulation has 64 snapshots. Information about the 
baryonic content of galaxies is tracked on finer timesteps, with 
8 "sub-steps" inserted between the N-body output times. Parts of 
the calculation, for example the luminosity of the composite stellar 
population can be followed on an even finer time grid, determined 
by an adaptive differential equation solver. 

3.2.2 Placement of galaxies in halos 

In the GALFORM model, the treatment of the properties of a galaxy 
will depend upon its status within its host halo, i.e. whether it is a 
central or a satellite galaxy. Central galaxies are placed at the centre 
of the most massive sub-halo of the host halo and are the focus for 
all gas that is undergoing cooling. In the event of a halo merger, we 
choose the central galaxy of the main (most massive) progenitor 
halo as the central galaxy of the descendant halo, with any other 
galaxies becoming satellites. It should be noted that according to 
this definition, the central galaxy of each (sub)halo need not be the 
most luminous or the one with the largest stellar mass. 

Following a halo merger, the central galaxy of the less massive 
progenitor halo becomes a satellite galaxy of the descendant halo. 
If the most massive sub-halo of the less massive progenitor can no 
longer be resolved (i.e. the sub-halo now has fewer than 20 parti- 
cles and has been lost), then the galaxy is placed on what was the 
most bound particle in that sub-halo. Satellite galaxies are stripped 
of their hot gas, thus quenching any further cooling and inhibiting 
long-term star-formation (see [Font et al.|2008[ for an alternative 
cooling model for satellites). 

A merger timescale is calculated based upon the initial energy 
and angular momentum of the satellite's orbit (which is chosen at 
random), as well as the mass of the satellite and the mass of the 
halo hosting the central galaxy and satellite system. It is expected 
that after this time the effects of dynamical friction will have caused 
the satellite to merge with the central galaxy. However, the merger 
timescale of a satellite is recalculated every time the satellite's host 
halo merges and becomes a sub-halo of a more massive halo (see 
[Cole et al.|2000l ). 



4 LIGHTCONE CONSTRUCTION 

We now outline the method adopted to construct lightcone mock 
catalogues. Our scheme shares many features in common with the 



methods used by [Blaizot et aT] ( [2005| ) and [Kitzbichler & White[ 
( [2007| ), with some improvements. 

By first running the GALFORM modej^ on the halo merger 
trees of the Millennium Simulation we generate a galaxy population 
that is used to build the lightcone catalogues. Galaxy properties are 
stored for each fixed, snapshot epoch that falls within the redshift 
range of interest for a particular survey. 

An observer is then placed inside the simulation box at a po- 
sition that can be set manuall3[^or at random. 

4.1 Replication of the simulation box 

The cosmology used in the Millennium Simulation means that the 
simulation box side-length, Lbox = 500/i~^Mpc, corresponds to 
the co-moving distance out io z = 0.17. Therefore, in order to 
generate a cosmological volume that is of sufficient size to fully 
contain any galaxy survey that extends out to a modest redshift, 
it is necessary to tile replications of the simulation box (see the 
discussion in § [2.1.2[ ). 

The number of replications per axis, rirep, that need to be 
stacked around the original box (containing the observer) is given 
byQ 



_-t^box 

where rmax is the maximum co-moving radial distance that we 
want to reach in the final mock catalogue. Including the original 
simulation box, we have a total of (2nrep + 1)^ replications. The 
Cartesian co-ordinate system, (X, Y, Z), of the combined 'super- 
cube' is then translated so that the observer is located at the origin. 

An unfortunate consequence of generating a large volume in 
this way is that structures can appear repeated within the final 
lightcone volume. Although repeated structures cannot have a co- 
moving separation less than the simulation box side-length, if any 
repeated structures have small angular separations when projected 
onto the 'mock sky', then projection-effect artefacts can be intro- 
duced into the catalogue. , Blaizot et al.| ( ,2005j ) illustrate the effect of 
these artefacts, along with possible methods for eliminating them. 
One method that they demonstrate to be effective is to apply ran- 
dom sequences of 7r/2 rotations and reflections to the replicated 
boxes so that any repeated structures are viewed at different ori- 
entations and appear as different structures. The problem with this 
approach is that, due to the periodic boundary conditions of the N- 
body simulation, when tiling the replications, any transformation 
besides a translation would add undesirable discontinuities into the 
underlying density field. However, if for example one wishes to 
extract clustering statistics, the underlying density field should be 
preserved. We therefore choose not to use this method in the con- 
struction of our lightcone catalogues. 

4.2 Orientating the observer 

Our aim when orientating the observer is to be able to define a right- 
handed Cartesian co-ordinate system, (X^, Y^, Z^), such that the 

^ We stress that our lightcone construction algorithms are independent of 
choice of semi-analytic model and can be run using any input galaxy for- 
mation model. 

^ Often one will choose to position the observer manually if they desire the 
observer to be placed in a specific location, such as an environment similar 
to the Local Group. 

^ \_x \ means that x is rounded down to the nearest integer. 
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Figure 1. Schematic of lightcone geometry. The axis defines the central 
line-of-sight vector of the observer. The angle 9'^ defines the angular size 
of the field-of-view of the lightcone. Any galaxy whose position vector, 
y'{X' ^ Y' ,Z'),is offset from the Z' axis by an angle 0' > 0'^ is excluded 
from the lightcone. 

observer is looking down the 7j' axis, as illustrated in Fig.[l] This 
axis defines the central axis of symmetry of the conical volume of 
the lightcone and points to the centre of the field of the lightcone on 
the mock sky, i.e. points along the central line-of-sight vector of 
the observer. The half-opening angle, 0'^, governs the angular extent 
of the field-of-view of the lightcone (see §4.3| l. The orientation of 
the observer is simply how we describe this vector, Z^ in terms of 
the global Cartesian axes of the "super cube", Z^(X, y, Z). 

For deep, pencil-beam mock catalogues, carefully choosing 
the orientation of the observer can minimise, or even remove, 
structure repetition. The approach adopted by|Kitzbichler & White 
( |2007| ) is to orientate the observer in a 'slanted' direction, with 
respect to the Cartesian axes of the simulation box, so that the 
observer is not looking along any of the Cartesian axes or the 
cube diagonals (along which structure repetition can introduce no- 
ticeable artefacts). By defining the central line-of-sight of the ob- 
server as Z^(X, y, Z) = {n,m,nm), where m and n are inte- 
gers with no common factor, [Kitzbichler & White] are able to con- 
struct lightcone catalogues with a near-rectangular sky coverage of 
1/m^n X (radians) in which the first repeated structure will 

lie at distance of ^ mnLbox from the observer[^When construct- 
ing lightcone catalogues for which we wish to minimise duplicated 
structures, albeit at the expense of the solid angle of the catalogue, 
we adopt this approach. This is necessary for applications consid- 
ering, for example, the angular clustering of galaxies, where pro- 
jection effects could severely distort the clustering signal. 

Once we are satisfied with the chosen orientation of Z^ we 
define the axis (to be perpendicular to both Z^ and X) and the 
axis (to be perpendicular to both X^ and Z^. 

4.3 Finalising the lightcone geometry 

Now that we know the location and orientation of the observer we 
can set about applying the necessary geometrical cuts to construct 

^ [Carlson & WhiteH2010) adopt a similar approach to |Kitzbichler & White] 
by performing volume remapping of the original simulation box such that 
the mock catalogue geometry can fit inside the new geometry without the 
need for box replication. 



the lightcone volume. The first step is to isolate a spherical volume 
about the observer, with a co-moving radius, rmax, whose value is 
sufficiently large that, given the flux limits of the survey we wish 
to emulate, we would expect to be well into the high-redshift tail of 
the galaxy redshift distribution (such that only a negligible fraction 
of the brightest, high redshift galaxies are missed). This radial cut is 
applied to help speed up the calculation so that we are not searching 
for galaxies in box replications that are too far from the observer to 
contribute a significant number of objects to the mock catalogue. 
For boxes with a fraction of their volume lying within rmax, we 
check when each galaxy will enter the observer's past lightcone 
(see §4.4| ). If a galaxy enters the lightcone at a distance greater than 
Tmax (or it ncvcr enters the lightcone at all) then it is discarded. 

Next we apply an angular cut on the mock galaxies, which is 
dictated by the solid angle of the galaxy survey we wish to mimic. 
The solid angle in steradians, Q, of the mock catalogue is defined 
by 

Q = 27r[l-cos(6';)] , (3) 

where O'r is the field-of-view angle of the catalogue. By varying 
the value of 0'^ we can construct lightcones with solid angles rang- 
ing from pencil beams, to all-sky (27r) surveys [^Following this cut, 
the catalogue volume resembles the sector of a sphere, with half- 
opening angle O'r and Z^ as its axis of symmetry. For those boxes 
whose volume overlaps that of the catalogue, we calculate the posi- 
tion at which each galaxy enters the lightcone. Using this position 
we calculate the angle 0' , the angle between the position vector of 
the galaxy and the Z^ axis, and discard any galaxy with 0' > 0^, 
that lies outside the solid angle of the catalogue. 

Finally, for those galaxies that are successfully included in the 
lightcone, we determine their right-ascension, a, and declination, S, 
on the mock 'sky' . We do this by first defining a sky coordinate sys- 
tem such that the observer's central line-of-sight vector, Z\ points 
towards a right ascension, ao, and declination, ^o, on the sky. We 
then determine the sky position of a galaxy by passing r(X, Z), 
through the transformation, 7^z(tto)7^Y(vr/2 — So) where IZz 
and 7Zy are the standard 3-dimensional Cartesian rotation matrices 
about the Z and Y-axes respectively. (We assume that lines of con- 
stant declination lie parallel to the X — y plane, so do not apply 
any rotation about the X axis). 

4.4 Positioning galaxies within the lightcone 

The lightcone selection of galaxies is carried out by identifying 
those galaxies whose light has sufficient time to reach the observer. 
However, before one can calculate when a galaxy enters the light- 
cone, one must determine the epoch at which its host dark matter 
halo enters the observer's past lightcone. 

4.4.1 Placement of halo centres 

A halo, located at r(X, y, Z, t), at a lookback time, t, will be "vis- 
ible" to the observer at all instances where |r(X, y, Z, t) | ^ rc (t), 

^ By setting = tt we can construct all-sky lightcone catalogues. When 
constructing such catalogues we can apply an additional geometrical cut to 
remove galaxies that would be obscured by the plane of the Milky Way. 
Having calculated the celestial co-ordinates of the galaxy on the mock sky, 
we determine the galactic latitude, h, of the galaxy and reject all galaxies 
with |6| < 6iini, where is the user-specified galactic latitude limit. 
The solid angle of the all-sky lightcone is then calculated as, ^^aii-sky = 
47r - 27r [sin - sin (-611^)]. 
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where Vc is the maximum distance that hght could have travelled 
in the time t, i.e. the maximum co-moving, radial distance that is 
visible to the observer. For a flat cosmology, i.e. with Q.k = 0, at 
the epoch corresponding to redshift, z, the maximum co-moving, 
radial distance, Tc, that is visible to an observer at the present day 
is given by. 



r cdz' 



(4) 



where Hq is the Hubble Constant at the present day, is the 
matter density of the Universe and Qa is the vacuum energy density 
of the Universe but at the present day. To construct a mock galaxy 
catalogue, we place each halo at the epoch at which it enters the 
observer's past lightcone, i.e. the epoch at which the halo would 
first become "visible" to the observer. If this epoch corresponds to 
a redshift, z, then the halo is placed at the position, r(X, y, Z, z), 
at which 



|r(X,y,Z, z) 



(5) 



Each snapshot, i, in the Millennium Simulation corresponds 
to a discrete cosmic epoch, with redshift, zt. To determine when 
a halo enters the lightcone, we loop over the simulation snapshots 
searching for the time-step during which Eq. ^ is satisfied. By 
comparing the position, Yj (Xj , Yj , , z^), of a halo j that exists at 
Zi to the maximum co-moving distance, rdzi), that is visible at that 
epoch, and doing the same for the descendant of the halo, labelled 
k, that exists at the next snapshot, Zi-^i < Zi, wq can determine 
whether halo j will enter the lightcone between the snapshots i and 
i + 1, i.e. whether z^+i < z < Zi. 

Using Vj (Xj , Yj , Zj , ) and r ^ (Xfc , Yfe , Zfc , Zi+ 1 ) as bound- 
ary conditions, we interpolate along the orbital path of the halo j 
to find the exact epoch, z, at which it enters the lightcone and the 
position, Vj (X, y, Z, z), at which this occurs. We use a cubic poly- 
nomial to describe the position of the halo, in each Cartesian direc- 
tion, as a function of the time t between the adjacent snapshots (i.e. 
ti-^i < t < ti). For example, the Cartesian X component of the 
path is given by, 

X(t) = Axt"" + Bxt^ + Cxt + Dx, (6) 

where Ax, Bx, Cx and Dx are coefficients that can be deter- 
mined by requiring that the boundary conditions (X{t = ti) = 
Xj{ti), X{t — ti-^i) — Xfc(ti+i), X[t — ti) — Xj(ti), 
X{t — ti+i) = Xk{ti^i)) are satisfied. The X component of 
the velocity of the galaxy at time, t, is given by the derivative of 
Eq.([6| with respect to time. Equations similar to Eq.([6]) can be de- 
rived for the Y(t) and Z(t) components. The centre of mass of the 
halo is then placed at Yj (X, Y, Z, z). 

Our decision to use interpolation to determine halo positions 
is an extension of the method of Kitzbi chler & Wh ite ( 2007 ), who 
explicitly chose not to use interpolation but instead placed halos ac- 
cording to the snapshot with the epoch closest to the one at which 
the halo enters the lightcone. [ Kitzbichler & White] adopted this ap- 
proach because of the difficulties inherent in using interpolation to 
predict realistic orbital paths for satellite galaxies. In the next sec- 
tion, we discuss these difficulties and suggest a solution that pro- 
vides a good approximation for our purposes. 



4.4.2 Placement of galaxies 

The finite spatial extent of halos means that central and satellite 
galaxies within a halo will enter the lightcone at slightly different 



times. Central galaxies are positioned on the most bound particle 
of the most massive SUBFIND group (see § |3.1.2| ) and are at rest 
relative to the halo. The location and time at which a central galaxy 
enters the lightcone is thus equal to that of its host halo and so for 
these galaxies we can use the calculation for the halo centre, as pre- 
sented in §4.4.1 1 However, satellite galaxies can enter the lightcone 
at an earlier or later epoch than the centre of the host halo. When 
positioning a satellite galaxy we can still interpolate over the evo- 
lutionary path of its host halo, but we must first correct the spatial 
positions along the path to account for the relative offset between 
the position of the satellite galaxy and the centre of the halo. There- 
fore, we first need a model to describe the orbital path of a satellite 
within its host halo. 

Modelling physically viable satellite orbits is a non trivial task. 
Difficulties arise when the large orbital velocities of satellite galax- 
ies lead to orbital time-scales that are much shorter than the spac- 
ing of the simulation snapshots. Care must therefore be taken to 
ensure that numerical artefacts do not introduce large positional 
errors, which might in turn lead to inaccurate predictions for the 
one-halo term in the galaxy correlation function. 

If, for example, we attempt to describe the orbital path of a 
satellite galaxy using a cubic polynomial in Cartesian space that is 
constrained to satisfy both the position and velocity boundary con- 
ditions, then in rare instances we may find that the large orbital ve- 
locities of satellite galaxies lead to orbital paths that are highly ec- 
centric and extend out to large orbital radii. In the majority of cases 
where orbital velocities are small, the cubic function fits an orbital 
path similar to a simple linear interpolation scheme, which ignores 
the velocity boundary conditions. Example orbits, modelled using 
different interpolation schemes, are shown in Fig [2] Unfortunately, 
if a halo and its descendent are found on opposing sides of the 
halo centre of mass, then these interpolation schemes would lead to 
satellites being positioned much closer to the centre of mass of the 
halo than they should be. This would have the effect of boosting 
the clustering signal on small scales, as shown in the two left-hand 
panels of Fig [3] 

Since measurements of galaxy clustering are integral for 
achieving many of the goals set by current and future galaxy sur- 
veys, we choose to prioritise the preservation of the galaxy cluster- 
ing signal in real space. We do this by moving to a 2-dimensional 
(2D) plane, defined by the position of the halo centre of mass and 
the positions of the satellite, j, and its descendent, /c, relative to the 
halo centre. By assuming that the orbit of satellite j is restricted to 
this plane, we use linear interpolation to express the change in the 
polar co-ordinates of the orbit of j (relative to the centre of mass 
of the halo, located at the origin) as a function of time between the 
snapshot epochs, t^+i < t < ti. 

We describe the angular position, 0, of the satellite along its 
orbit as 



(/)(t) = (|)J{t^) + [0fc(t, 



+ 1) 



ti. 



t^ 



(7) 



A caveat is that we perform the interpolation along the path that 
corresponds to the smallest angular separation between the posi- 
tion of a satellite and its descendent, which may lead to satellites 
changing directions. To describe the change in the radius, p, of the 
orbit of the satellite we can choose to either linearly interpolate the 
radius in the same way, i.e. 



p(t) = Pj(ti) + [pk{ti+i) - Pj{U)] 



t-t^ 



(8) 
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Figure 2. Examples demonstrating the modelling of the orbital paths of satellite galaxies between two adjacent simulation snapshots using different interpo- 
lation schemes. The positions of the satellite galaxies are displayed relative to the centre of mass of the halo, which is marked with a +. Circles show circular 
orbits at the start and end radii of the path of the satellite galaxy. The various interpolation schemes, which use either 3 -dimensional Cartesian co-ordinates or 
2-dimensional polar co-ordinates, are discussed in j4.4.2| and are shown using different line colours and styles, as indicated by the key in the top left panel. 
For the application presented in ^we use the 2-dimensional polar, linear interpolation scheme. 
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Figure 3. The real- space correlation function of galaxies predicted using four different satellite interpolation schemes: cubic interpolation in 3D Cartesian 
space (far left), linear interpolation in 3D Cartesian space (middle left), radial interpolation in 2D polar space (middle right) and modelling the satellite orbits 
using a logarithmic spiral in 2D polar space (far right). The upper panels show the correlation function for galaxies at two adjacent simulation snapshots 
(corresponding to redshifts z = 1.91 and z = 2.07, grey and black dashed lines) and the same galaxies at six intermediate redshifts (various solid, coloured 
lines). The lower panels show the ratio of each correlation function, relative to the correlation function measured at the z = 2.07 snapshot. 
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or couple the radius to the angle using a simple model, such as a 
logarithmic spiral, 



p(t) — a ■ e 



(9) 



where a = PjiU) and b = (l)k{ti+i)\n{pk{ti+i)/ pj{ti)). Note 
that in these two cases we are ignoring the velocity boundary condi- 
tions and assuming that p{t) = 0(t) = 0. However, as can be seen 
in Fig [3] these methods preserve the galaxy clustering to smaller 
scales than possible with the 3 -dimensional (3D) cubic or linear 
approaches. Note that in the above cases, we can interpolate the or- 
bital velocities of satellites using the same equations as used for the 
positions. For the application in §[5] we have adopted to interpolate 
the satellite positions using Eq.(|7]) and Eq.([8]), i.e. a 2-D polar linear 
interpolation of both the angle and radius of the satellite orbit. 

By converting back to 3D Cartesian coordinates we can ex- 
press the epoch, z, at which the satellite enters the lightcone as the 
position at which. 



|rhaio(X, y, Z, z) + r;(X, y, Z, z)\ = rc(^), 



(10) 



where rhaio(-^, Z, z) is the global position of the dark matter 
hosting the satellite at this epoch and Tj {X, z) is the position 
of the satellite relative to the centre of this halo. 

Fig[3] shows that between z = 1.91 and z = 2.07 (which is 
a 5% change in 1 + 2;) there is a 20% difference in the amplitude 
^(r). If we did not interpolate the position of satellite galaxies be- 
tween snapshots, then at a redshift intermediate to z = 1.91 and 
z = 2.07, there would be up to a ^ 10% error in the correlation 
function. Whilst any interpolation scheme is approximate, it is clear 
that it is better to make an attempt to adjust the galaxy positions if 
the lightcone crossing occurs between snapshots, rather than jump- 
ing from one set of fixed positions to the other, which would result 
in discontinuities in the correlation function. 



4.5 Treatment of galaxy properties in the lightcone 

4.5.1 Intrinsic properties 

For each galaxy that enters the lightcone and satisfies the geometri- 
cal cuts described in §4.3| we need to output galaxy properties (i.e. 
stellar mass, SFR, etc.) that are appropriate for the epoch at which 
we have placed the galaxy. 

With knowledge of the star-formation history of the galaxy, 
we can follow the evolution of any galaxy property over cosmic 
time. However, as with galaxy positions, this information is only 
recorded at the discrete epochs corresponding to the simulation 
snapshots. Ideally we would like to again use interpolation to deter- 
mine the value for any galaxy property at any given epoch. Unfortu- 
nately the evolution of the majority of galaxy properties is complex 
and by interpolating between the snapshot epochs we risk over- 
simplifying this evolution and deriving incorrect values. For in- 
stance, the build-up of the stellar mass of a galaxy between two con- 
secutive snapshots will receive contributions from many different 
sources. Besides quiescent star formation in the disk, many other 
events, such as disk instabilities or mergers with one or more other 
galaxies, can lead to starbursts and a SFR that is highly variable 
with time. In the case of galaxy mergers we cannot accurately say, 
from the snapshot data alone, when during a time- step the merger 
occurred. Therefore interpolation over the properties of each pro- 
genitor may lead to double counting and, at the epochs at which 
they enter the lightcone, each progenitor having properties that are 
(possibly significantly) mis-estimated. 

We could evaluate galaxy properties for any given epoch by 
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Figure 4. Distribution of SDSS g — r colour as a function of redshift, z, 
in a lightcone catalogue constructed for the GAMA survey, both without 
k-correction interpolation (upper panel) and with k-correction interpolation 
(lower panel, see §4.5.2| for details). Shading corresponds to the number 
density of galaxies. (Note that the apparent vertical stripes in the galaxy 
distributions correspond to peaks in the galaxy redshift distribution.) 



solving the set of coupled differential equations that govern the ex- 
change of material between the hot gas in the halo and the cold gas 
and stars in the galaxy. However, this exercise is non-trivial and 
would require the full calculation performed by GALFORM to be 
reproduced for each galaxy out to the epoch at which it enters the 
lightcone, which would be computationally expensive. Similarly, 
we could have originally run GALFORM and output the galaxy prop- 
erties on a finer time mesh. However, this would extend the run time 
of the model and take up significantly more disk space. Instead, we 
adopt a procedure similar to that of Kitzbichle r & White ( 2007| ) 
and, for any galaxy that enters the lightcone, we assign the galaxy 
the intrinsic properties it had at the snapshot immediately prior to 
the epoch, z, at which it entered the lightcone, i.e. the snapshot i 
with the smallest redshift, Zi, for which Zi > z. 

4.5.2 Observed properties 

Having set the intrinsic properties of the galaxies, we can now use 
this information, along with their positions, to evaluate their ob- 
served properties, namely their observed fluxes (and apparent mag- 
nitudes). At this point we need to use the position of a galaxy to de- 
rive its luminosity distance, d^, which is required to relate the emit- 
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Figure 5. The predicted distribution of Kab ^ 23 galaxies with 1.4 < z < 2.5 in the BzK colour plane, colour coded, as indicated by the key on the right 
of each panel, according to the median value in a 2-dimensional colour-colour bin for different galaxy properties: stellar mass (upper left), star-formation rate 
(upper right), stellar metallicity (lower left) and stellar mass weighted age (lower right). The solid and dashed lines correspond to the BzK criteria used by 
[Daddi et ^j2004a1 (see |5.1| for further details). The sBzK and pBzK regions have been labelled in the upper left panel. 



ted luminosity per unit frequency, Luii^e), of an object to its ob- 
served flux per unit frequency, S'zy (z^o) • For a flat Universe the lumi- 
nosity distance out to a redshift z is simply, dL{z) = rc{z)(l -\- z). 
Therefore a galaxy in the lightcone at a cosmological redshift z will 
have an observed flux. 



5,(z..) = (l + z, 



47ldl (2 



(11) 



where Uo is the observed frequency of the light from the galaxy. The 
emitted (rest-frame) frequency is related to the observed (observer- 
frame) frequency by z^e = z^o (1 + ^) - The observer-frame apparent 
magnitude of a galaxy, in the AB system, is then given by. 



mAB - -2.51ogio 



(12) 



where R{vo) is the filter response of a specified photometric band 
and iS'i.o is the AB reference flux per unit frequency ( Oke & Gunn| 
[T983] ). 

In our case, GALFORM calculates the emitted luminosity of a 
galaxy, so we can calculate the observer-frame absolute magnitude, 
Mab, of the galaxy (assuming 2 7^ 0) from, 



Mae 



-2.51ogi 



/L.(z.e)i^(^)dz.e 



(13) 



where L^^ is now the AB reference luminosity, L^o — 
47r(10pc)^5'i/o. From this we can calculate the observer-frame ap- 
parent magnitude of a galaxy, in the AB system, by. 



TTlAB - MaB + 5 logi 



lOpc 



■2.51ogio(l + ^) 



(14) 



Due to the large number of galaxies modelled by GALFORM, 
the full SED of each galaxy is not stored. Instead, the luminosity is 
computed in a set of filters specified at run time. Hence the defini- 
tion of the filter response in the galaxy rest frame, R{h'e/ (1 + 2;)), 
is tied to the output redshifts of the simulation snapshots and the k- 
correction applied does not correspond to the redshift of the galaxy 
in the lightcone. This discrepancy leads to visible discontinuities in 
distributions involving the photometric properties of the galaxies, 
such as galaxy colours versus redshift, as shown in the upper panel 
of Fig.|4] The breaks apparent in the distribution correspond to the 
redshifts of the simulation snapshots. 

As discussed in the previous section, the complex time de- 
pendence of galaxy luminosity means that we cannot simply in- 
terpolate the absolute magnitudes. However, since the size of the 
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wavelength shift appUed to a filter depends only on the redshift to 
the galaxy (and not on any of its intrinsic properties), then we can 
apply a correction to all the observer-frame absolute magnitudes 
(and dust emission luminosities) to take into account the redshift 
of lightcone crossing. Consider again a galaxy, j, that is originally 
found in the snapshot, i, at redshift Zi, and which has a descendent 
in snapshot i + 1, at a redshift z^+i < Zi. Assume that the galaxy 
has an observer-frame absolute magnitude Mj {zi). Since the wave- 
length shift applied depends only on the redshift of a galaxy, we can 
easily predict the observer-frame absolute magnitude that j would 
have if placed at the redshift of its descendent in snapshot i + 1, 
i.e. Mj {ziJ^i) within the GALFORM code, but with a star-formation 
history computed up to ti. If the galaxy, j, enters the lightcone at 
an intermediate epoch, z, then we can interpolate between Mj{zi) 
and Mj{zi-^i) to estimate Mj{z). Note that by interpolating the 
magnitudes (and luminosities) in this way, we have not changed 
the shape of the SED of the galaxy, but rather have applied a fur- 
ther systematic wavelength shift to the galaxy SED. As can be seen 
in the lower panel of Fig.|4] this correction, which was also applied 
by |Blaizot et"ari ( [2005] ) and |Kitzbichler & White] pOOTt , smooths 
out the 'saw-tooth' pattern seen in the colour distribution. 

We can also calculate an observed redshift for the mock galax- 
ies, emulating the measurement that would be taken from a galaxy 
spectrum using one or more identified emission lines. The observed 
redshift, 2;obs, of a galaxy, which includes the cosmological redshift 
due to the Hubble flow as well as a component due to the local pe- 
culiar motion of the galaxy, is defined by. 



A, 
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(15) 



where z is the cosmological redshift at which the galaxy enters the 
lightcone and Vr is the radial component of the peculiar velocity 
vector, V, of the galaxy (i.e. = v f , where f is the normalised 
line-of-sight position vector of the galaxy). We do not at present 
include any calculation of photometric redshifts, or their uncertain- 
ties, in GALFORM or our lightcone code. These properties can be 
readily calculated in post-processing using the photometry output 
for each galaxy. 



4.6 Applying the survey criteria 

The final stage in constructing a mock catalogue is to apply the ra- 
dial selection criteria of the survey being mimicked and reject those 
galaxies fainter than the specified flux limits. For many surveys this 
involves placing a cut on the flux at one or more wavelengths or an 
apparent magnitude limit in one or more photometric bands. We 
can select galaxies according to any intrinsic or observed galaxy 
property. For example, besides generating flux limited lightcone 
catalogues, we are able to construct catalogues limited by stellar 
mass, atomic hydrogen mass or even halo mass. Given a list of se- 
lection criteria, we can control whether a galaxy must pass just one 
or all of these criteria simultaneously in order to be included in the 
final catalogue. Note that the lightcone catalogues that we produce 
correspond to ideal surveys, i.e. we do not apply any completeness 
masks or simulate the loss of galaxies due to poor observing con- 
ditions, fibre collisions or quality of spectra. Such completeness 
effects are survey specific and can be applied to the catalogues in 
post-processing. 
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Figure 6. The shading shows the transmission curves of the B, z and K 
filters used by Daddi et al. (2004a I. Also shown are the synthetic spectra 
(plotted as luminosity per unit wavelength) for two galaxies at 2; = 2. The 
spectra were obtained using PEGASE . 2 ( Fioc & Rocca-Volmerange|1999), 
assuming a Kennicutt (1983 1 IMF, and a single instantaneous burst of star 
formation. The spectra are shown for two different epochs, when the stellar 
population has an age of 300 Myr (blue) and an age of 3000 Myr (red) and 
for a sub-solar (thin line) and a super-solar (thick Hne) metalHcity. The flux 
and transmission units are arbitrary and the spectra have been normalised 
so as to be visible on similar scales to the transmission curves. 



5 APPLICATION TO THE BzK COLOUR SELECTION 

In this section we use a lightcone mock catalogue of 50 deg^, built 
using the B ower et al.| ( [200 6) GALFORM model, to study the prop- 
erties of galaxies selected using the BzK colour technique. We have 
constructed the lightcone by selecting all galaxies with Kab ^ 24. 



5.1 The BzK selection 

The BzK colour selection is designed to identify galaxies in the 
redshift interval 1.4 < z < 2.5 based upon their location in the 
(B - z) vs. (z - K) colour plane (Daddi et al.|2004a| . Addition- 
ally, the selection is also advertised as being able to separate star- 
forming galaxies from those that are passively evolving. 

|Daddi et "aP identified star-forming galaxies at z > 1.4 (re- 
ferred to as star-forming BzK, or sBzK galaxies) using the criterion 



BzK ^ -0.2, 



(16) 



where BzK = (z — K)ab — (B — z)ab- This condition is indi- 
cated by the solid black line in Fig. [5] The sBzK region lies above 
this line, as labelled in the upper left panel of Fig. [5] 
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proposed 



To select galaxies at z > 1.4 that are p assively evol ving (re- 
ferred to as passive BzK, or pBzK galaxiesf^ |Daddi e t al.h 
applying the following conditions: 

BzK < -0.2 and (z - K)ab > 2.5. 



(17) 



The pBzK galaxies populate the region between the solid and 
dashed lines in Fig. [s] i.e. the upper right region of the (B — z) 
vs. (z — K) colour plane. 

The BzK selection works by using colours that sample key 
features in the spectral energy distributions of galaxies at 1.4 < 
z < 2.5, mainly the rest-frame 4000 A break and the UV contin- 
uum slope. Fig. [6] shows the synthetic spectra for two galaxies at 
z = 2, obtained using the PEGASE . 2 stellar population synthesis 
code ( |Fioc & Rocca-Volmerange|1999| ). The red lines correspond to 
a galaxy that is dominated by an old stellar population and thus ex- 
hibits a prominent break around 4000A (in the rest frame), which 
is created by the build-up of the absorption lines of ionised met- 
als. Between 1.4 < z < 2.5 the break moves over the observed 
wavelength range ^ 9000 - 15000A, between the z- and K-bands. 
In this redshift range, we find that the (z — K) colours of model 
galaxies become monotonically redder with increasing strength of 
the 4000A break. 

At 2; ^ 1.4, the continuum longwards of the 4000A break 
is shifted into the z-band, resulting in galaxies at these redshifts 
having bluer (z — K) colours (and, for a fixed B-band flux, red- 
der (B — z) colours). The BzK criteria are therefore designed to 
exclude these galaxies, which lie to the right of the solid line and 
below the dashed line in Fig. [5] However, as we will see in § |5.4| 
the finite width of both the break and the z-band filter, mean that 
we would expect some contamination to occur, as well as the loss 
of some galaxies within the target redshift interval, 1.4 < z < 2.5. 

As can be seen in Fig. [6] the 4000A break is stronger for 
galaxies with old stellar populations or high stellar metallicity 
( IKauffmann eTaLlllOOS] |Kriek et al.||2006l [20TT] ). As such, we 
would expect old, metal rich galaxies to display redder (z — K) 
colours. The lower two panels in Fig. |5] show the predicted vari- 
ation of metallicity (left) and stellar mass weighted age (right) 
within the (B — z) vs. (z — K) plane. The predicted trends with 
(z — K) colour agree with the expectations for the variation of the 
4OOOA break with both age and metallicity. However, these trends 
are weakened by the effect of dust in young galaxies, which red- 
dens the (z — K) colour. 

To isolate young, metal poor galaxies at 1.4 < 2; < 2.5 that 
have not yet developed a strong 4000A break, another spectral fea- 
ture is required. Young, star-forming galaxies have steep UV con- 
tinua due to the presence of bright, young, hot stars. At 2; ^ 2 
the UV continuum is shifted into the optical, as shown in Fig. [6] 
The presence of the steep UV slope boosts the B-band flux of these 
galaxies, leading them to have very blue (B — z) colours, as can 
be seen in the lower right panel of Fig. |5] The correlation between 
UV luminosity, due to young stars, and star-formation rate, SFR, 
suggests that we would also expect a correlation between the SFR 
of a galaxy and its (B — z) colour for a given K-band limit. Such a 
trend is clearly visible in the upper right panel of Fig. [5] 



^ We shall refer to the combined sBzK and pBzK galaxy population as 
BzK galaxies. 



5.2 Predicted numbers of BzK galaxies 

Since the BzK technique is used to select a subsample of K-band 
(or B-band) selected galaxies, we first inspect the predicted total 
number counts of all galaxies in the B and K-bands in the mock 
catalogue, which are shown by the solid lines in Fig. [7] We remind 
the reader that our mock catalogue has a solid angle of 50 deg^ 
and that galaxies were selected with Kab ^ 24. In both bands, the 
GALFORM mock catalogue provides a reasonable match to the ob- 
served counts. The B-band counts are in excellent agreement with 
the observed numbers, though turn over at Bab 25 due to the 
Kab ^ 24 selection used to construct the lightcone. 

As a sanity check, the dotted lines in Fig. [T] show the differ- 
ential number counts obtained by integrating the GALFORM galaxy 
luminosity function over co-moving volume. The excellent agree- 
ment between the counts computed using the luminosity functions 
directly from the snapshot outputs and those from the lightcone 
demonstrates the success of the magnitude interpolation scheme 
used to create the lightcone. The light grey shaded region in Fig|7] 
shows the spread (10 to 90 percentile range) in the counts for 100 
separate fields each with a solid angle of 1 deg^, a solid angle typ- 
ical for the observational datasets we are comparing. These fields 
were generated by randomly selecting field centres within the foot 
print of the lightcone, with a buffer zone to avoid placing field cen- 
tres too close to the edge of the foot print. In the right-hand panel of 
Fig. [7] the counts computed from the luminosity function diverge 
from the predicted counts in the lightcone due to the K-band limit 
used to construct the lightcone (whereas the integral over the lumi- 
nosity function is independent of this limit). 

In the left-hand panel of Fig. |8] we show the number counts 
of all BzK galaxies with Kab ^ 24 from the mock catalogue 
(solid line). Overall the mock provides a reasonable match to the 
observed counts. At faint magnitudes (Kab ^ 22), the turnover 
in the observed BzK counts is sharper than predicted. However, 
in this region the observations could be incomplete. The closest 
agreement between the predictions and observations occurs for 
Kab 20.5 — 22.0, where there is a clear change of slope in both 
the observations and the GALFORM predictions. At Kab ^ 21, 
^ 1/6 of both observed and predicted K-band selected galaxies 
are also BzK galaxies. Brightwards of Kab 19.5, the predicted 
BzK counts exceed the counts for K-band selected galaxies within 
1.4 < z < 2.5 (shown by the dotted line) due to low redshift 
interlopers (see § |5.4.2| ). In Fig. [s] the light grey shaded region 
again shows the 10 — 90 percentile spread in the differential num- 
ber counts of BzK selected galaxies in 100 separate fields, each 
with a solid angle of 1 deg^. At bright magnitudes, the extent of 
this shaded region indicates that the spread in the observed counts 
can be explained as sampling variance arising from the small solid 
angles probed. 

We now consider the predicted number counts for the subsam- 
ples of sBzK and pBzK galaxies, shown in the middle and right 
panels of Fig. [s] For faint fluxes (Kab ^ 21), sBzK galaxies, both 
observed and predicted, dominate the BzK population due to the 
turnover in the pBzK counts that can be clearly seen in the right- 
hand panel of Fig. [8] 

The GALFORM sBzK number counts show a good overall 
agreement with the observations. However, the model over-predicts 
the number of the faintest sBzK galaxies. This may simply be 
the result of the observed sBzK counts becoming incomplete at 
faint magnitudes. The predicted number counts of pBzK galax- 
ies are in reasonable agreement with observations in the range 
19.8 < Kab ^ 20.8. However, the model under-predicts the 
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Figure 7. Predicted KAB-band (left) and Bab -band (right) differential number counts for all Kab ^ 24 selected galaxies in the lightcone constructed using 
the |Bower et al.| ( p006| model (solid lines). The dotted lines show the number counts calculated by integrating the GALFORM galaxy luminosity function over 
co-moving volume. The latter uses a single band limit, hence the discrepancy with the counts from the lightcone in the B-band. Also shown are observationally 
estimated K-band number counts from Saracco et al. (2001); Vandame et al. (2001); lovino et al. (2005); M etcalfe et"ar]j2006) ; |Kong et arH2 006); Lane et al.l 
(j2007); Hartley et al. (2008); Keenan et al. ( 2010i; McCracken et al. ( 2010i; Bielby et al. (201 1) and B-band number counts from Lilly et al. (1991); Ferguson] 
[eTal . (2000i; Arnouts et al. (2001); McCracken et al. ( 2003); Kashikawa et al. ( 2004); Capak et al. ( 2004j; |Rovilos et al.| f2009). In the left-hand panel, the 
light grey shaded region shows the 10 — 90 percentile spread in the KAB-band differential number counts for 100 separate 1 deg^ lightcone fields. 




Figure 8. Predicted Kab -band differential number counts for all BzK (left), sBzK (middle) and pBzK selected galaxies (right) with Kab ^ 24 in the 
lightcone catalogue (solid lines). The dashed fines show the predicted number counts when a B-band detection fimit of Bab ^ 28 is considered in addition 
to the K-band limit (see § |5.2| in t he left-hand panel, this line is underneath the solid one). The dot-dashed lines show the BzK number counts when extinction 
due to dust is omitted (see §5.5.5") . In the left-hand panel, the dotted fine shows the counts for all Kab ^ 24 selected galaxies within 1.4 < z < 2.5. In the 
middle panel the dotted line corresponds to the counts of galaxies with NUV — r < 3.5, Kab ^ 24 and 1.4 < z < 2.5 and in the right-hand panel the 
dotted line corresponds to the counts of galaxies with NUV — r ^ 3.5, Kab ^ 24 and 1.4 < z < 2.5 (see § |5.2| for further detafis on the colour cut). Also 
shown are observationally estimated number counts from Re ddy et ar] {2Q05); Kong et al. (2006); Lane et al. (2007); Blanc et^j2008| ; Hartley et al. (2008); 
|Imai et al.| {2008) ; [McCracken et al.| ( pOTo) ; [Bierby et al. (20TT) . In the left-hand panel, the light grey shaded region shows the 10 — 90 percentile spread in 
the KAB-band differential number counts of BzK galaxies in 100 separate 1 deg^ lightcone fields. 
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number of brighter pBzK galaxies and over-predicts the number 
of fainter galaxies. This mismatch between semi- analytical pre- 
dictions and observed pBzK number counts has previously been 
shown by McCracken et al.| ( |2010^) , who compared their observa- 
tional counts to the predictions of the mock catalogues of |Kitzbicir| 
|ler & White] ( |2007| ). Moreover, the |Kitzbichler & White] model gave 
a poorer match to the observed sBzK counts than we find. 

The turnover at faint magnitudes in the counts of pBzK se- 
lected galaxies has been reported by several authors (e.g. |Lane 



et al.|2007[|Hartiey et al.|2008| [McCracken et al.|2010]|Bielby et al, 



201 1| |. The model also displays a turnover in the pBzK counts, but 
at Kab ~ 21, ~ 1 mag fainter than in the data. Both Hartley et al.| 
( [2008] ) and [McCracken et al.| ( [20T0| propose that limited B-band 
photometry is responsible for the turnover. [Hartley et al.[ showed 
that reclassifying ^ 34 per cent of their sBzK galaxies as pBzK 
galaxies, would be sufficient to remove the turnover (we will return 
to this point in § [5.4.3| l. We demonstrate the impact of the depth 
of the B-band photometry by recalculating the predicted counts of 
BzK, sBzK and pBzK galaxies assuming a B-band detection limit 
of Bab = 28 in addition to the K-band limit of Kab ^ 24. Any 
galaxy in the mock catalogue with a B-band magnitude fainter than 
this is assumed to be undetected in B and its (B — z) colour is cal- 
culated assuming Bab = 28. This is the approach typically used 
in observational catalogues to estimate the colours of objects that 
are undetected in a band. The effect this has on the counts is shown 
by the dashed lines in Fig. [8] Although the predicted pBzK counts 
are still not in full agreement with the data, applying a B-band limit 
has reduced the mismatch. With the B-band limit applied, ^ 50 per 
cent of the pBzK galaxies are relabelled as sBzK galaxies preferen- 
tially at the faintest K-band magnitudes. This supports the conclu- 
sion that shallow B-band photometry contributes to the turnover. 
[Hartley et al.j and ^McCracken et al.j both observed the turnover for 
K-band limited samples down to Kab ^23.5 and Kab ^ 23 re- 
spectively, with B-band detection limits of BAB,iim = 28.4 and 
BAB,iim = 29.1 respectively. The predicted excess of faint pBzK 
galaxies is also partially a result of the Bo wer et al.[ (|2006 ) model 
predicting too many red galaxies. A substantial fraction of pBzK 
galaxies are satellites. These galaxies could be too red due to the 
treatment of gas stripping in satellite subhalos (see [Font et aT] 
[20081 . If we plot the predicted pBzK number counts consider- 
ing only central galaxies (without applying any B-band detection 
limit), we find that the predicted excess of faint pBzK galaxies is 
reduced, leading to excellent agreement with the observed counts. 

The BzK criteria is not the only technique used observa- 
tionally to classify galaxies as star-forming at z ^ 2. The rest- 
frame near-UV/optical colour, NUV — r, can also be used to sep- 
arate star-forming and passive galaxies. Following Ilbert et aT] 
( [2010| ), we divide the K-band selected GALFORM galaxies lying 
within lA < z < 2.5, into star-forming galaxies (i.e. those with 
NUV — r < 3.5) and passively evolving galaxies (NUV — r ^ 
3.5) and calculate their number counts. As shown in the middle 
panel of Fig. [8] the predicted sBzK counts are consistently some- 
what higher than those predicted for galaxies with a blue NUV — r 
colour in 1.4 < 2; < 2.5. We note, however, that low-redshift con- 
tamination will exaggerate the counts in the brightest bins. The 
predicted number counts of pBzK galaxies are systematically be- 
low the counts of passive galaxies estimated using the NUV — r 
colour. This highlights the sensitivity of the separation of galax- 
ies into star-forming and passively evolving classes to the precise 
colour criteria used. 

The mock catalogue is able to reproduce the combined num- 
ber counts of all BzK galaxies, as well as providing reasonable 
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Figure 9. The predicted redshift distributions of BzK selected galaxies in 
the mock catalogue (black solid line) for two different K-band flux limits: 
Kab ^21 (top) and Kab ^ 23 (bottom). For comparison, the redshift 
distribution of all K-band selected mock galaxies down to these limits is 
shown by the grey shaded region. The limits of the redshift range which the 
BzK technique was designed to probe are indicated by the vertical dotted 
lines. Blue dashed and red dotted histograms show the redshift distributions 
of sBzK and pBzK galaxies respectively. In the bottom panel are plotted 
observed redshift distributions for BzK galaxy samples with Kab ^ 23.8 
and Kab ^ 22.9 from |Grazian"erni {2007) and |Quadri erari ( |2057) re- 
spectively. 



agreement with the counts of K-band selected galaxies within 
1.4 < z < 2.5. Although the model is able to reproduce the pre- 
dicted number counts of sBzK galaxies (which dominate the BzK 
population), it is unable to reproduce the predicted number counts 
of pBzK galaxies. 



5.3 Predicted redshift distribution of BzK galaxies 

In Fig. |9] we show the predicted redshift distributions for BzK 
galaxies in the GALFORM mock catalogue for two example K- 
band flux limits, Kab ^ 21 and 23. For comparison, we also 
show the redshift distribution for all model galaxies brighter than 
the stated K-band limit, and use vertical dotted lines to indicate 
the redshift range which the BzK technique is designed to probe, 
1A< z < 2.5. 

It is clear from Fig. |9] that BzK galaxies probe the high red- 
shift tail of the redshift distribution of K-selected galaxies. For ex- 
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ample, the predicted median redshift of the Kab ^ 23 sample is 
Zraed 1 .2, while the BzK subsample has a higher median redshift 
of Zmed ^ 1.9. Moreover ^ 98 per cent of the GALFORM galaxies 
within 2.0 < z < 2.5 are selected as BzK galaxies. However, in the 
redshift range 1.4 < z < 2, the fraction of galaxies selected by the 
BzK technique decreases with decreasing redshift. For Kab ^21 
the fraction of galaxies at z = 1.4 that are recovered is ^ 20 — 25 
per cent, compared to ^ 50 per cent for Kab ^ 23. For Kab ^21 
the fraction of galaxies recovered reaches 50 per cent at 2; ^ 1.55. 

In the lower panel of Fig. |9] we compare the redshift distri- 
bution for our BzK selected galaxy sample with observed photo- 
metric redshift distributions from Grazi an et al.| f2007 ) and Quadri 
leFaTl ( |2007t , selected with Kab ^ 23.8 and Kab ^ 22.9 re- 
spectively. The predicted BzK redshift distribution has a median 
redshift, Zmed — 1.8, that is consistent with the median redshifts of 
the observed distributions, 1.7 < Zmed S, 1-9. As we have seen in 
the left-hand panel of Fig.[8] the GALFORM model over-predicts the 
number counts of BzK galaxies at Kab = 23 and so, understand- 
ably, for all redshift bins within 1.4 < z < 2.5, the mock catalogue 
predicts a greater number of BzK galaxies than is observed. 

We can see from Fig. |9] that the redshift distribution of sBzK 
galaxies consistently peaks at lower redshifts than the pBzK distri- 
bution. This can also be seen in Fig.[To] which shows the predicted 
large-scale distribution of Kab ^21 predicted galaxies and the 
subsamples of sBzK and pBzK galaxies. Fig. [To| shows that, while 
sBzK galaxies can be selected at redshifts down to z ^ 0, pBzK 
galaxies only start to appear at z ^ 1.4. In Fig. [To|we can also 
see that at z ^ 2 the pBzK galaxies appear to trace filamentary 
structures compared to the sBzK galaxies, which appear to be less 
clustered. Only for fainter limits of Kab ^ 23, do sBzK galaxies 
begin to trace the filamentary structure at z ^ 2. This suggests that 
the predicted spatial clustering of pBzK galaxies is stronger than 
that for sBzKs, in agreement with observations (e.g. [Kong et al.| 
|2006l|Hartley et al.|2008| ). 

5.4 Efficiency of the BzK selection 

The BzK technique was designed to select galaxies within 1.4 < 
z < 2.5 and to separate them into star-forming and passively 
evolving subsamples. To assess the effectiveness with which the 
BzK technique achieves these goals, we consider the completeness 
(§ |5.4.1| ) and contamination (§ |5.4.2| ) of a BzK galaxy sample se- 
lected from the GALFORM mock catalogue. 



5.4.1 K -band completeness 

In this section we explore the fraction of K-band selected galaxies 
at 1.4 < z < 2.5 that are actually picked up when using the BzK 
selection technique for galaxies in the GALFORM mock catalogue. 
For this purpose we compare the predicted number counts of BzK 
galaxies, presented in the left-hand panel of Fig. [8] with the total 
number counts of Kab -band selected galaxies that lie within the 
target redshift range, shown by the dotted line in the same panel 
of Fig. [8] Faintwards of Kab 19.5 the predicted BzK counts 
are in good agreement with the counts of 1.4 < z < 2.5 galaxies, 
indicating that the BzK selection is an effective probe of the galaxy 
population at this epoch. 

we show the (B 
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z) vs. (z — K) plane popu- 
lated by GALFORM galaxies within three different redshift regimes, 
z ^ 1.4 (left column), 1.4 < z < 2.5 (middle column) and 
z ^ 2.5 (right column). The distribution is shown for our two ex- 
ample K-band flux limits: Kab ^21 (top row) and Kab ^ 23 



(bottom row). We define the completeness of the BzK technique as 
the fraction of all galaxies in 1.4 < 2; < 2. 5 that lie in either of the 
BzK regions in the (B — z) vs. (z — K) plane. About a quarter of 
the galaxies brighter than Kab = 21 within 1.4 < z < 2.5 lie out- 
side of the BzK regions. The distribution has two clear peaks, one 
at (B — z) ^ 0, which we will refer to as the star-forming peak, 
and the other at (B — z) ^ 5, which we will refer to as the pas- 
sively evolving peak. The star-forming peak falls well within the 
sBzK region, while the passively evolving peak lies just outside the 
pBzK region. This would explain the under-prediction of the pBzK 
number counts for Kab ^ 20. However, for Kab ^ 23, the star- 
forming peak dominates the galaxy population suggesting that for 
fainter K-band limits the BzK selection provides a more complete 
sample of the 1.4 < z < 2.5 galaxy population. 

The completeness of the BzK technique, as a function of 
the limiting K-band magnitude of the galaxy sample, is shown in 
Fig.[T2] Here, the data points show the BzK completeness estimates 
from Bielb y et al.| ( |2011| ), who applied the BzK selection to an in- 
put catalogue of ~ 1.8 million K-band galaxies, with photometric 
redshifts (crAz/(i-\-z) ^ 0.03), from the WIRCam Deep Survey 
(WIRDS). We can clearly see in Fig. [12] that the BzK complete- 
ness increases with fainter K-band limiting magnitude. The same 
trend is seen for the completeness predictions for GALFORM galax- 
ies, shown by the solid line, with ^ 55, ^ 73 and ^ 80 per cent of 
1.4 < z < 2.5 galaxies being recovered for Kab ^ 21, 22 and 23 
respectively. Therefore, for faint K-band limits (KAB,iim ^ 22), 
the BzK technique is consistently selecting 75 to 80 per cent of the 
galaxies within 1.4 < z < 2.5. However, for a very bright limit of 
Kab ^ 20 the technique identifies less than half of the galaxy pop- 
ulation within the target redshift range. For 21 < KAB,iim ^ 22 
the completeness estimates from the GALFORM mock catalogue are 
in very good agreement with the WIRDS estimates. 

5.4.2 Contamination 

We now explore the predicted numbers of galaxies outside the red- 
shift range, 1.4 < z < 2.5, that are picked up by the BzK selection. 
As we can see from Fig.[TT] it is not possible to ever have a sample 
of BzK selected galaxies that is entirely free of contamination from 
interlopers with low, z ^ 1.4, or high redshift, z ^ 2.5, which are 
classified as BzK galaxies. The left-hand column of Fig. pT] shows 
that low redshift interlopers are typically classified as sBzK galax- 
ies, while the (z — K) cut used in Eq. 17 successfully eliminates 
low redshift pBzK galaxies. High redshift interlopers, shown in the 
right-hand column of Fig.[TT] appear to be more evenly distributed 
between the sBzK and pBzK regions and thus more difficult to re- 
move. 

The fractions of low and high redshift interlopers, as a func- 
tion of K-band limiting magnitude, are shown in Fig. [12] by the 
dashed and dotted lines respectively. From Fig. [12] we can see that 
by applying a bright Kab ^ 20 selection to our GALFORM mock 
catalogue, the BzK technique selects approximately equal numbers 
of galaxies with 1.4 < z < 2.5 and z ^ 1.4. Pushing the K- 
band selection limit to fainter magnitudes leads to a decrease in the 
fraction of low redshift contamination as an increasing number of 
galaxies within 1.4 < z < 2.5 become visible at fainter K-band 
limits. Fig. [9] shows clearly how the redshift distribution of BzK 
galaxies develops a sharper low redshift cut-off as the flux limit is 
made fainter. By KAB,iim 21.5, the low redshift contamination 
has fallen to ^ 18 per cent. For fainter flux limits the low red- 
shift contamination decreases much more slowly, reaching ^ 10 
per cent by KAB,iim ^ 24. 
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Figure 10. Wedge plots showing a slice in redshift and right ascension, 1° wide in declination, of the predicted distribution of all galaxies with Kab ^21 
(top) and the subsamples of sBzKs (middle) and pBzKs (bottom). 



As expected, the fraction of high redshift interlopers increases 
steadily with increasingly faint limiting K-band magnitude, though 
it stays well below ^ 20 per cent. By KAB,iim ^ 23.2, the con- 
tribution from low redshift and high redshift contamination is ap- 
proximately equal at ^ 12 per cent, with high redshift interlopers 
dominating the contamination at fainter limiting magnitudes. 



Although we have not included the effect of the inter-galactic 
medium (IGM) attenuation in this particular lightcone, we have 
checked its effect on the galaxy (B — z) and (z — K) colours. We 
have examined the (B — z) vs. (z — K) plane at discrete redshift 
snapshots: z ^ 2.0, z ^ 2.5 and 2; 3. We find that the IGM 
attenuation has only a modest affect on the number of BzK galax- 
ies 3Li z > 3, which is well into the high redshift tail of the galaxy 
redshift distribution. 



5.4.3 Dependence on B-band depth 

As we have seen in §5.2| there is evidence that the ability of the BzK 
technique to distinguish between star-forming and passive galaxies 
within 1.4 < z < 2.5 is dependent upon the B-band depth of 
the galaxy sample. For example, Grazian et al. ( 2007 ) determined 
that 22 per cent of their sample of sBzK galaxies had SEDs typical 
of passive galaxies rather than star-forming galaxies. A significant 
number of these galaxies were undetected in the B-band and had 
their {B — z) colours estimated using a la B-band upper limit, 
which resulted in their (B — z) colours being too blue. [Grazian 
|et al. I concluded that, for faint iC-band selected galaxies with very 
red {z — K) colours, a lack of deep B-band photometry will lead to 
many pBzK galaxies being incorrectly classified as sBzK galaxies. 

In Fig. [13] we show the variation of the median B-band appar- 
ent magnitude with position in the (B — z) vs. (z — K) plane for 
Kab ^ 23 galaxies in the GALFORM mock catalogue. The trend 
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Figure 11. The distribution of synthetic galaxies in the BzK colour plane for two K-band flux limits, Kab ^ 21 (top row) and Kab ^ 23 (bottom row). The 
columns correspond to three different redshift ranges: z ^ 1.4 (left), 1.4 < z < 2.5 (the redshift interval which the BzK technique was designed to select, 
middle) and z ^ 2.5 (right). The black solid line and dashed line correspond to the sBzK and pBzK cuts of |Daddi et al.| ( |2004a| respectively. The colour 
shading indicates the surface density of galaxies on the mock sky, as shown by the scale on the right-hand side. 



towards fainter B -band magnitudes for redder (B — z) and (z — K) 
colours is immediately clear and supports the need for deep B-band 
photometry to probe the faint pBzK population. 

We apply a B-band detection limit of BAB,iim ^ 26 to a 
Kab ^ 23 sample of galaxies, by assuming that galaxies with B- 
band magnitudes fainter than BAB,iim are undetected and so have 
Bab = BAB,iim. By doing this, we find that only ^0.3 per cent 
of galaxies within 1.4 < z < 2.5 are classified as pBzK galaxies. 
Making the B-band limit fainter leads to a larger fraction of pBzK 
galaxies. With upper limits of BAB,iim ^ 27, 28 and 29 we find that 
^ 3, ^ 9 and ^ 15 per cent of 1.4 < z < 2.5 galaxies respectively 
are classified as pBzK galaxies. An upper limit of BAB,iim ^ 30 
leads to the same number of pBzK galaxies being recovered (^16 
per cent) as when applying the Kab ^ 23 selection in isolation. 

A bright B-band limit will also lead to galaxies that should not 
be classified as BzK galaxies being scattered into the sBzK region 
of the (B 



5.4.1 for 



z) vs. (z — K) plane. As we have seen in ' 
a Kab ^ 23 selected galaxy sample, the BzK technique selects 
^ 80 per cent of 1.4 < z < 2.5 galaxies. If we apply B-band 
detection limits of BAB,iim ^ 26, 27 and 28 we find that the BzK 
technique selects ^ 95, ^ 87 and ^ 80 per cent of galaxies within 
1.4 < 2; < 2.5 respectively. 

We conclude that adopting a fainter B-band limit should im- 
prove the ability of the BzK technique to distinguish between star- 
forming and passively evolving galaxies. 



5.5 The predicted properties of BzK galaxies 

5.5.7 Stellar mass 

From the upper left panel of Fig. [5] we can see that the predicted 
stellar masses of galaxies in a Kab ^ 23 selected BzK sample 
range from 10^h~^MQ to lO^^/i"^M0, with the more mas- 
sive galaxies typically having redder (z — K) colours. 

We show in Fig. [14] the distribution of stellar masses for all 
K-band selected galaxies (within lA < z < 2.5). The median 
stellar mass of BzK selected galaxies is in excellent agreement with 
the distribution for all K-band selected galaxies for all flux limits 
fainter than KAB,iim 21. Additionally, the 10 and 90 percentiles 
of the BzK distribution consistently match the 10 and 90 percentiles 
for the stellar mass distribution of the whole galaxy population. 

Early studies of BzK galaxies, using K-band limits of Kab ^ 
22, inferred BzK galaxies to be ver y massive, with typic al stellar 
massej^of > 5 x 10^°/i~^Mq Ipaddietal . 2004b a | 2005b[al 
[Reddy et al.|2"005l|Kong et al.|2006[|Blaireral.|20 08). In Fig. [14] 
we show the median stellar mass of BzK selected galaxies (within 
lA < z < 2.5) as a function of the K-band flux limit. For Kab ^ 
22, the distribution of BzK stellar masses in the mock catalogue is 



^ The quoted value for the observed mass has been multiplied by a factor 
of 1 .4 jPontana et al. [200 41 in order to account for the change from SalpeterJ 
jl955) IMF, used in observational studies, to the ,Kennicutt jl983j IMF used 
for the study presented here. 
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Figure 12. The efficiency of the BzK selection as a function of K-band 
limiting magnitude, KAB,iim- The solid line shows the predicted fraction 
of GALFORM galaxies within lA < z < 2.5, with Kab ^ KAB,iim, 
that are identified as BzK galaxies. The filled circles correspond to com- 
pleteness estimates for observed galaxies in the WIRCam Deep Survey 
(WIRDS, Bie lby et al.|20 11 ) that have been calculated in the same way 
as the GALFORM predictions. The error bars shown correspond to Poisson 
errors. The dashed and dotted lines show the predicted fraction of interlop- 
ers at z ^ 1.4 and z ^ 2.5 respectively, as a function of K-band limiting 
magnitude. 




Figure 13. The variation in the median B-band apparent magnitude of 
galaxies with position in the (B — z)vs. (z — K) plane. The distribution 
shown corresponds to GALFORM galaxies, within 1.4 < z < 2.5, selected 
to have Kab ^ 23 and placed into 2-dimensional bins spanning the BzK 
colour-colour space. The bins are coloured according to the median B-bmd 
magnitude of the galaxies in that bin, as shown by the colourbar. 
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Figure 14. The predicted stellar mass of galaxies with redshift lA < z < 
2.5, as a function of K-band limiting magnitude for all BzK galaxies (black 
circles), sBzK galaxies (blue, filled squares) and pBzK galaxies (red, open 
squares). Data points correspond to median values and error bars show the 
10 and 90 percentiles. For clarity, the data points for the sBzK and pBzK 
values have been offset horizontally. The light and dark grey regions show 
the 10 — 90 and 40 — 60 percentiles for all galaxies brighter than the K-band 
flux limit (i.e. irrespective of whether they are BzK selected). 



consistent with observations. Increasing the depth in the K-band, 
leads to a shift in the distribution towards smaller stellar masses, 
with a median BzK stellar mass of - lO^°/i"^M0 being reached 
atKAB,iim - 23.5. 

We also show in Fig. [14] the breakdown of the distribution 
into sBzK and pBzK galaxies. It is immediately clear that, typi- 
cally, pBzK galaxies are more massive than sBzK galaxies, with 
the difference between the medians increasing towards fainter K- 
band limits. 

We conclude that BzK selected galaxies appear to provide a 
representative sample of the galaxy stellar masses at 1.4 < z < 2.5 
and do not appear to be significantly biased towards either very high 
or low mass galaxies. 



5.5.2 Star Formation Rate 

As we have already seen in Fig. [5] there is clear trend in the pre- 
dicted SFR of galaxies across the (B — z) vs. (z — K) plane. In the 
extremes of the distribution we find that many sBzK selected galax- 
ies are predicted to have SFRs of — 100 /i^^Moyr"^ or more, 
while many pBzK selected galaxies have SFRs of effectively zero. 
We find that trend in the specific star-formation rate (sSFR, equal 
to the star-formation rate of a galaxy divided by its stellar mass) 
across the (B — z) vs. (z — K) plane is almost identical to that of 
the star-formation rate. 

In Fig. [15] we show the distribution of SFRs for BzK, sBzK 
and pBzK galaxies, as well as for all K-band selected galaxies, as a 
function of K-band flux limit, in the redshift range lA < z < 2.5. 

For Kab ^ 21, the median SFR for BzK galaxies is in rea- 
sonable agreement with the distribution for the whole galaxy pop- 
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ulation, though is perhaps sUghtly biased towards higher SFRs. 
This would, at first, suggest that the BzK selection is missing a 
fraction of the passive galaxy population, particularly since we 
have shown in Fig. [8] that the GALFORM mock catalogue matches 
the number of sBzK galaxies but under-predicts the number of 
bright pBzK galaxies. It is possible that some fraction of these 
faint pBzK galaxies are dusty star-forming galaxies that have been 
mis-classified as being passive. For Kab ^ 21, we find that 
^ 20 per cent of the pBzK selected galaxies in the GALFORM 
mock catalogue have SFRs > 0.1 /i^^Moyr"^. Interestingly, the 
typical SFR of pBzK galaxies remains approximately constant. 



at 



10" 



10"^/i"^M 



with increasing K-band depth 



(though the distribution is very broad). For KAB,iim ^ 23, the typ- 
ical SFR of sBzK galaxies also appears to remain almost constant 
at - 1 - lO/i"^M0yr-\ 

In Fig. [H] we see that the model predicted median SFR of 
BzK galaxies with Kab ^ 22 to be — 1 /i^^Moyr"^. However, 
observational studies of BzK galaxies with Kab ^ 22 concluded 
many of these bright BzK galaxies to be starbursting g alaxies, with 
SFRs of - 50/i~^MQyr~^ - 100 /i~^M0 yr~^ (e.g. [Paddietal. 
|2004a[|Kong et al.|2006| [Blanc et a l. 2008). A possible explanation 
for this discrepancy is the overly efficient shut down of gas cooling 
by AGN feedback in the Bower et al. ( 2006) model, which has been 
previously suggested by Gonzalez-Perez et aL] ( [2009] ). 

Based upon the model predictions however, we predict that, 
towards fainter K-band limiting magnitudes, the BzK technique is 
typically selecting galaxies with SFRs that are consistent with the 
median SFRs of the galaxy population within 1.4 < z < 2.5. 

We note that a trend similar to that seen in the predicted me- 
dian SFR is seen in the median values of the sSFR of BzK se- 
lected galaxies. At faint K-band limits, the median sSFR of BzK 
and sBzK galaxies tends towards — 10~^°yr~\ The GALFORM 
model predicts this value to be typical for K-band selected galaxies 
at 1.4 < z < 2.5. As with the median SFR, the median sSFR tends 
towards a constant value of — 10~^^ yr~^. 



5.5.3 Metallicity 

We have already seen in § |5.1| that the metallicity of K-band se- 
lected galaxies varies with position in the (B — z) vs. (z — K) 
plane. From the lower left-hand panel of Fig. [5] we can see that 
galaxies with the reddest (z — K) colours (typically pBzK and faint 
sBzK galaxies) are in general the most metal rich. 

In Fig. [16] we show the metallicity distribution for BzK se- 
lected galaxies within 1.4 < z < 2.5. For all K-band limits consid- 
ered the metallicity distribution of BzK selected galaxies is in good 
agreement with the metallicity distribution for all K-band selected 
galaxies. The trend in the metallicity distribution as a function of 
K-band flux limit is very similar to the trend seen in the stellar mass 
distribution in Fig.^] For brighter K-band flux limits, one would 
predict to recover BzK galaxies with higher metallicities. For the 
brightest flux limit considered, the median metallicities for BzK 
galaxies falls below that for all galaxies. As with the stellar mass 
distribution, this is due to GALFORM under-predicting the counts 
of bright pBzK galaxies, which one would expect to be metal-rich. 
The distributions for the separate sBzK and pBzK subsets show 
that for any K-band depth, pBzK galaxies will typically be more 
metal-rich than sBzK galaxies, though the distributions for the two 
subsets do overlap. 
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Figure 15. The predicted star-formation rate as a function of K-band lim- 
iting magnitude for galaxies in 1.4 < z < 2.5. The symbols and shaded 
regions are the same as in Fig. [14] 



5.5.4 Age 

In the lower right-hand panel of Fig. [5] we show the median stellar 
mass weighted age of galaxies in the (B — z) vs. (z — K) plane. 
We find that the oldest galaxies occupy the region where the den- 
sity of passive galaxies peaks just below the pBzK region, as can 
be seen in the middle column of Fig.[TT] The vast majority of the 
oldest galaxies, with ages above 2 Gyr, that fall outside the pBzK 
region lie within the redshift interval 1.4 < z < 2.5. This is due 
to the finite width of the 4000A break, which at 2; ^ 1.4 is be- 
ginning to enter the response curve of the z-band, thus making the 
(z — K) colours of these galaxies bluer. Above z = 2, all of the 
galaxies with ages above approximately 1.5 Gyr lie well within 
the pBzK region on the colour plane. We have checked that the 
(z — K) colours of the galaxies are not significantly affected by 
changing between a Kennicutt (1983) and a |Salpeter| ( |195 5) IMF. 

We show in Fig. [T7j the distribution of the stellar mass 
weighted ages for all K-band selected galaxies and for those that 
are BzK-selected. Like the distribution of SFRs, the distribution 
of ages of BzK galaxies is in reasonable agreement with the age 
distribution for all K-band selected galaxies, though appears to be 
slightly biased towards younger galaxies. As with the SFR distribu- 
tion, we see that the typical ages of sBzK and pBzK galaxies remain 
approximately constant (at ^ 1.1 Gyr and ^ l.TGyr respectively) 

for KAB,lim > 22. 



5.5.5 Dust 

Reddening due to dust can mimic a large break at 4000A in the 
spectra of star-forming z < 1.4 galaxies ( [Kriek et al.|2006|[2011| ). 
However, many authors have argued that the effectiveness of the 
BzK colour selection is not significantly affected by dust extinction 
(e.g. IDaddi et al.||2004al |Kong et al.|[2006] [Hayashi et al1|2007] 
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Figure 16. The predicted stellar metallicity as a function of K-band limiting 
magnitude for galaxies inl.4<2;<2.5. The symbols and shaded regions 
are the same as in Fig. [14] 
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Figure 17. The predicted stellar mass weighted age as a function of K-band 
limiting magnitude for galaxies in 1.4 < z < 2.5. The symbols and shaded 
regions are the same as in Fig.[T4] 



Grazian et al.|2007[[lfotley et al.|2008l [Hayashi et al.|2009[pnl 
etaL|2011| ). 

For the K-band limits considered in Fig. we find that in 
the presence of dust the distribution of 1.4 < z < 2.5 galaxies in 
the (B — z) vs. (z — K) plane remains relatively unchanged, aside 
from an increased scatter in the sBzK galaxy population towards 
redder (z — K) colours. We find that for Kab ^ 23 the presence 
of dust reddens the colours of BzK galaxies, within 1.4 < z < 2.5, 
by A(B -z) 0.15 and A(z - K) - 0.3. The presence of dust 
appears to have a greater effect on the median colours of sBzK 
galaxies, as we find a negligible change in the median colours of 
pBzK galaxies. We can see this also in Fig. [8] where the number 
counts of sBzK galaxies without dust extinction are boosted by 1 
dex, while the pBzK number counts remain the same. The reduction 
in the sBzK counts when dust extinction is included is likely due to 
dust reddening the (B — z) colours of star-forming galaxies (with 
(z — K)ab < 2.5) and scattering them out of the sBzK region. 



6 CONCLUSIONS 

We have presented a method for constructing end-to-end mock 
galaxy catalogues by applying a semi-analytical model of galaxy 
formation to the halo merger trees extracted from a cosmological 
N-body simulation. The mocks that we construct are lightcone cata- 
logues, in which a galaxy is placed according to the epoch at which 
it first enters the past lightcone of the observer. Thus our catalogues 
incorporate the evolution of galaxy properties that is predicted over 
the simulation snapshots. We use interpolation to determine the po- 
sitions of galaxies at epochs intermediate to the simulation snap- 
shots, which represents an improvement over previous work. We 
have shown that our adopted interpolation scheme leads to accu- 
rate predictions for real space galaxy clustering down to scales well 
within the one-halo regime. 



We can summarise our method for constructing lightcone cat- 
alogues as follows: 

(i) Populate the dark matter halos in the snapshot outputs of 
a cosmological N-body simulation with galaxies using a physi- 
cal model of galaxy formation, giving populations of galaxies at 
a range of cosmic epochs. Here we use the dark matter halos from 
the Millennium Simulation dSpringe l et al.||2005| ), which we pop- 
ulate with galaxies, whose positions and properties are calculated 
using the GALFORM semi-analytical model. (In this work we adopt 
the Bower et al..(2006 ) version of GALFORM.) 

(ii) Position an observer within the simulation box. Replicate the 
simulation box to span a cosmological volume that is of sufficient 
size to encompass the galaxy survey that we wish to mimic. 

(iii) For replication of the box, use adjacent pairs of simulation 
snapshots to determine the epoch at which each galaxy enters the 
observer's past lightcone. Use interpolation to determine the cor- 
responding position of the galaxy at this epoch. Reject all galaxies 
that enter the observer's lightcone at a position outside of the solid- 
angle of the galaxy survey. 

(iv) Assign each galaxy that enters the lightcone the intrin- 
sic properties that the galaxy had at the lowest redshift snapshot 
prior to the galaxy entering the lightcone. Use the position of the 
galaxy to convert luminosities and absolute magnitudes into ob- 
served fluxes and apparent magnitudes. Reject all galaxies that fall 
outside of the flux limits which define the galaxy survey. 

Our approach has a number of attractive features. First, we use 
a physic model of galaxy formation which makes ab initio predic- 
tions. This means that we can build mocks for epochs or selections 
which are currently unprobed. Empirical approaches are not able 
to do this, as they depend on the existence of observations. Sec- 
ond, our construction method is generic and is not tied to a partic- 
ular choice of N-body simulation or semi- analytic model. As bet- 
ter N-body simulations or more accurate galaxy formation models 
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become available, our method can still be used. Third, the semi- 
analytic model that we have used has a unique multi- wavelength ca- 
pability, which means that we can mimic surveys built using many 
different telescopes such as GAMA. 

As an illustrative application of our method we considered the 
effectiveness of the BzK colour selection technique which is de- 
signed to isolate galaxies within the redshift range 1.4 < ^ < 2.5 
(Daddi et al. 2004a). The aim of this exercise is to determine how 
successful this technique is at isolating galaxies within the target 
redshift range and whether the galaxies it selects are representative 
of the target population or a biased subsample. 

The GALFORM model is able to match reasonably well the 
K-band number counts of all BzK galaxies, as well as the counts 
of sBzK galaxies. However, the model under-predicts the num- 
ber of bright pBzK galaxies and over-predicts the number of faint 
pBzK galaxies. The latter discrepancy is partially due to the ef- 
fect of the depth of B-band photometry, but may also be related 
to the crude estimate of the stripping of gas from satellite galax- 
ies that is carried out in the Bower et al.| ( [2006| ) model. The BzK 
technique successfully selects the majority of the galaxy popula- 
tion within 2 < 2 < 2.5 (and possibly out as far as 2; 3), 
though is less efficient for 1.4 < z < 2.0. Examination of the 
effectiveness of the BzK technique as a function of K-band lim- 
iting magnitude suggests that the technique recovers > 75 per 
cent of the 1.4 < z < 2.5 galaxy population for K-band limits 
fainter than Kab ^ 22. For brighter limits the completeness de- 
creases substantially as the BzK population becomes dominated 
by low redshift interlopers with z ^ 1.4. For magnitude limits 
Kab ^ 21.5, the fraction of contamination from BzK galaxies 
outside 1.4 < ^ < 2.5, remains approximately constant at ^ 30 
per cent. We have also shown that a variation in the typical B-band 
magnitude across the BzK plane can lead to the mis-classification 
of pBzK galaxies as sBzK galaxies if the B-band photometry is 
of insufficient depth. Finally, we considered the intrinsic properties 
of BzK galaxies, including their stellar mass, SFR, metallicity and 
stellar mass weighted age. We find that BzK galaxies display distri- 
butions of these various properties that are in good agreement with 
the corresponding distributions for all galaxies with Kab ^ 20.5. 
However, at brighter K-band limits BzK galaxies appear to be less 
massive, more star-forming, less metal-rich and younger than the 
overall population. This is likely related to the under-prediction of 
the bright pBzK number counts. The presence of dust increases 
the scatter in the colours of (faint) sBzK galaxies, though does 
not dramatically change the colour distribution of galaxies within 
lA<z< 2.5. 

We conclude that the BzK colour selection does provide a rep- 
resentative sample of the 1.4 < z < 2.5 population, working better 
for fainter K-band flux limits. However, the depth of B-band pho- 
tometry and extinction due to dust may lead to confusion between 
the sBzK and pBzK subsets. 

The tool that we have developed in this paper is a valu- 
able resource to aid in the exploitation of a wide range of sur- 
veys, from traditional optical selection to novel properties, such 
as the neutral hydrogen content of galaxies. Lightcone mock cat- 
alogues for different surveys will be made available for download 
at |http://www.dur.ac.uk/a.i merson/lightcones.html[ 
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